Today, we see an example of Toccata’s memory managment freeing a value the instant it’s no longer needed.
As I was writing the last post, I came across an old post by Jon Harrop (who you’ll remember kicked off this whole train of thought) showing a loop that would cause problems for C++’s scope-based reference counting.
let rec loop tmp i =
if i<=0 then tmp else
let tmp2 = loop (Array.copy tmp) (i-1)
tmp2.[0] <- tmp2.[0] + 1
tmp2
The problem is that recursive call to loop
in the middle. The value tmp
is copied and the copy is passed to the next iteration of loop
which means it should be available for collection and reuse. But C++ can’t do that, and uses way more memory than needed. Possibly even dying in the process.
So I wrote this loop in Toccata:
(defn loop [tmp n]
(either (and (< n 0) (maybe tmp))
(add-numbers 1 (loop (add-numbers 1 tmp) (dec n)))))
Toccata doesn’t have an ‘if’ expression (see here for the reasons). But I’m seriously thinking of adding one. In the meantime, the either...and...maybe...
construct is equivalent.
This gets compiled to a C function which I’ll pull pieces in to explain what’s happening.
Value *g_loop_3949;
Value *g_loop_3950(List *closures, Value *tmp_0, Value *n_1) {
while (1) {
incRef(tmp_0, 1);
incRef(n_1, 1);
Value *andRslt2;
// static-fixed
// #line 3 "recur-test.toc"
Value *rslt3 = g__LT__806(empty_list, n_1, numPtr0);
andRslt2 = rslt3;
if (isNothing(andRslt2,"",0)) {
dec_and_free(tmp_0, 1);
} else {
dec_and_free(andRslt2, 1);
// call-maybe
// #line 3 "recur-test.toc"
Value *maybe4 = (Value *)maybe((List *)0, (Value *)0, tmp_0);
andRslt2 = maybe4;
}
The variable g_loop_3949
is going to be how we’ll do the recursive call, so ignore it for now. But see how tmp_0
and n_1
are passed into the function, then both of them have their refs incremented by one. This happens because the compile saw that they were both passed as arguments at 2 different call sites.
The first call site for n_1
is up first where it’s checked to see if it’s less than 0. For the sake of discussion, let’s assume that n_1
had a ref count of 1 when passed in. At this point, it’s count is 2. It gets passed to g__LT__806
which does the check and then decrements its ref count. So after that call, its ref count is 1.
Now it gets a little interesting. The execution could go down either branch of the if
. In one, tmp_0
is passed to the function maybe
, which would cause it’s ref count to be decremented by one. So the other branch has to decrement tmp_0
’s ref count as well. This leads to another rule
So if tmp_0
’s ref count was 1 when it was passed in, it’s first incremented by 1 and then decremented in the true branch of the ‘if’. But there’s a nuance in the false branch. First, we no longer care about the actual value of andRslt2
, so we dec it’s ref count and free it. Then we have to make andRslt2
point to the result of the call to maybe
.
The maybe
function actually wraps whatever value it’s given in a Maybe
struct and that is a reference to the value. So even though the compiler believes tmp_0
’s ref count decremented, it actually wasn’t. But that’s fine because maybe4
has a ref count of 1 and it references tmp_0
. So when maybe4
is eventually freed, it will then decrement tmp_0
’s count. Which happens next.
Now we need to check the result of the and
expression. At this point andRslt2
points to nothing (has a Null pointer in the struct), if the less-than expression was false. Or it and maybe4
are pointing to the same value. And that value is a Maybe struct that points to tmp_0
. If it’s not nothing
, we need to extract the value from inside it and return it. So in this branch of the if
, rslt2
and tmp_0
point to the same struct, and it’s ref count is 2 (assuming tmp_0
’s count was 1 at the top of the function). So before we return, we need to decrement tmp_0
’s ref count and return rslt2
. Even though they’re the same value in this case, in the general case they won’t be, so we have to treat them differently here.
if (!isNothing(andRslt2,"",0)) {
Value *rslt12 = maybeExtract(andRslt2);
dec_and_free(tmp_0, 1);
dec_and_free(n_1, 1);
return(rslt12);
} else {
The larger point to realize is that this function is done with tmp_0
after the true branch is finished, so it needs to have it’s ref count decremented.
Now let’s look at the top of the false branch:
} else {
dec_and_free(andRslt2, 1);
// static-fixed
// #line 4 "recur-test.toc"
Value *rslt5 = g_add_numbers_131(empty_list, numPtr5, tmp_0);
The function no longer cares about andRslt2
/maybe4
so it decrements and frees the memory. In this case, andRslt2
was not pointing to anything. And tmp_0
’s ref count is 1. This is the key point of this whole post. The compiler could determine that if the execution reached this point, there would only be 1 reference to tmp_0
, the current execution scope. So then add-numbers
gets called (you can look at its code in the previous post). After extracting the integer value from tmp_0
, it’s done with it so it decrements its ref count to 0 and frees the memory. This is instant tmp_0
is no longer needed and so it gets freed. If it had been a TCP socket or file handle, that resource would have been closed and freed as well.
So Toccata frees this memory as promptly as possible in this specific case and does so in the general case as well.
For the purposes of this post, the assumption was that tmp_0
’s ref count was 1 when it was passed in to this function. If the callee of this function needed to use this value after this function returned, it would have set the ref count to something greater than 1. In that case, add-numbers
would have decremented it, but not freed the memory because it would still be greater than 0.
Since we’re on a roll looking at Toccata’s memory management, I think I’ll tackle the latency issue next time.
Jon Harrop pointed out that my Toccata version didn’t adhere to the spirit of the original program. If tmp
is an array (or vector) of size m
and the number of iterations n
, the loop should run in memory on the order of m * n
. Naive ref counting algos aren’t able to reclaim the vector tmp
until the end of the scope function even though it’s no longer use after a copy is made. Whereas a tracing GC can collect that space earlier.
So I re-wrote the Toccata version to more closely match the intention
(defn loop [tmp n]
(assert (instance? Vector tmp))
(either (and (< n 0) (maybe tmp))
(let [tmp2 (loop (map tmp identity) (dec n))
tmp2 (either (for [first-num (get tmp2 0)
tmp2 (store tmp2 0 (inc first-num))]
tmp2)
tmp2)]
tmp2)))
However, the analysis of the generated code doesn’t change. tmp
is still collected before the call to map
returns because its ref count is 1 before the call. So the Toccata version only ever has, at most, 2 copies of the vector being manipulated at once. So this version runs in m * 2
memory. And I think I could make the runtime smarter so that it resuses the space occupied by tmp
in the call to map so that it would run in memory on the order of size of the vector.