Toccata
Feb 27 - Reference Counting Promptness Comments

Today, we see an example of Toccata’s memory managment freeing a value the instant it’s no longer needed.

Once more into the breach

As I was writing the last post, I came across an old post by Jon Harrop (who you’ll remember kicked off this whole train of thought) showing a loop that would cause problems for C++’s scope-based reference counting.

let rec loop tmp i =
  if i<=0 then tmp else
  let tmp2 = loop (Array.copy tmp) (i-1)
    tmp2.[0] <- tmp2.[0] + 1
    tmp2

The problem is that recursive call to loop in the middle. The value tmp is copied and the copy is passed to the next iteration of loop which means it should be available for collection and reuse. But C++ can’t do that, and uses way more memory than needed. Possibly even dying in the process.

But …

So I wrote this loop in Toccata:

(defn loop [tmp n]
  (either (and (< n 0) (maybe tmp))
          (add-numbers 1 (loop (add-numbers 1 tmp) (dec n)))))

Toccata doesn’t have an ‘if’ expression (see here for the reasons). But I’m seriously thinking of adding one. In the meantime, the either...and...maybe... construct is equivalent.

This gets compiled to a C function which I’ll pull pieces in to explain what’s happening.

Value *g_loop_3949;
Value *g_loop_3950(List *closures, Value *tmp_0, Value *n_1) {
  while (1) {
    incRef(tmp_0, 1);
    incRef(n_1, 1);
    Value *andRslt2;

    // static-fixed
    // #line 3 "recur-test.toc"
    Value *rslt3 = g__LT__806(empty_list, n_1, numPtr0);
    andRslt2 = rslt3;
    if (isNothing(andRslt2,"",0)) {
      dec_and_free(tmp_0, 1);
    } else {
      dec_and_free(andRslt2, 1);

      // call-maybe
      // #line 3 "recur-test.toc"
      Value *maybe4 = (Value *)maybe((List *)0, (Value *)0, tmp_0);
      andRslt2 = maybe4;
    }

The variable g_loop_3949 is going to be how we’ll do the recursive call, so ignore it for now. But see how tmp_0 and n_1 are passed into the function, then both of them have their refs incremented by one. This happens because the compile saw that they were both passed as arguments at 2 different call sites.

The first call site for n_1 is up first where it’s checked to see if it’s less than 0. For the sake of discussion, let’s assume that n_1 had a ref count of 1 when passed in. At this point, it’s count is 2. It gets passed to g__LT__806 which does the check and then decrements its ref count. So after that call, its ref count is 1.

Now it gets a little interesting. The execution could go down either branch of the if. In one, tmp_0 is passed to the function maybe, which would cause it’s ref count to be decremented by one. So the other branch has to decrement tmp_0’s ref count as well. This leads to another rule

So if tmp_0’s ref count was 1 when it was passed in, it’s first incremented by 1 and then decremented in the true branch of the ‘if’. But there’s a nuance in the false branch. First, we no longer care about the actual value of andRslt2, so we dec it’s ref count and free it. Then we have to make andRslt2 point to the result of the call to maybe.

The maybe function actually wraps whatever value it’s given in a Maybe struct and that is a reference to the value. So even though the compiler believes tmp_0’s ref count decremented, it actually wasn’t. But that’s fine because maybe4 has a ref count of 1 and it references tmp_0. So when maybe4 is eventually freed, it will then decrement tmp_0’s count. Which happens next.

Big If True

Now we need to check the result of the and expression. At this point andRslt2 points to nothing (has a Null pointer in the struct), if the less-than expression was false. Or it and maybe4 are pointing to the same value. And that value is a Maybe struct that points to tmp_0. If it’s not nothing, we need to extract the value from inside it and return it. So in this branch of the if, rslt2 and tmp_0 point to the same struct, and it’s ref count is 2 (assuming tmp_0’s count was 1 at the top of the function). So before we return, we need to decrement tmp_0’s ref count and return rslt2. Even though they’re the same value in this case, in the general case they won’t be, so we have to treat them differently here.

  if (!isNothing(andRslt2,"",0)) {
    Value *rslt12 = maybeExtract(andRslt2);
    dec_and_free(tmp_0, 1);
    dec_and_free(n_1, 1);
    return(rslt12);
  } else {

The larger point to realize is that this function is done with tmp_0 after the true branch is finished, so it needs to have it’s ref count decremented.

But If Not …

Now let’s look at the top of the false branch:

  } else {
    dec_and_free(andRslt2, 1);

    // static-fixed
    // #line 4 "recur-test.toc"
    Value *rslt5 = g_add_numbers_131(empty_list, numPtr5, tmp_0);

The function no longer cares about andRslt2/maybe4 so it decrements and frees the memory. In this case, andRslt2 was not pointing to anything. And tmp_0’s ref count is 1. This is the key point of this whole post. The compiler could determine that if the execution reached this point, there would only be 1 reference to tmp_0, the current execution scope. So then add-numbers gets called (you can look at its code in the previous post). After extracting the integer value from tmp_0, it’s done with it so it decrements its ref count to 0 and frees the memory. This is instant tmp_0 is no longer needed and so it gets freed. If it had been a TCP socket or file handle, that resource would have been closed and freed as well.

So Toccata frees this memory as promptly as possible in this specific case and does so in the general case as well.

But what if …

For the purposes of this post, the assumption was that tmp_0’s ref count was 1 when it was passed in to this function. If the callee of this function needed to use this value after this function returned, it would have set the ref count to something greater than 1. In that case, add-numbers would have decremented it, but not freed the memory because it would still be greater than 0.

And now

Since we’re on a roll looking at Toccata’s memory management, I think I’ll tackle the latency issue next time.

Update

Jon Harrop pointed out that my Toccata version didn’t adhere to the spirit of the original program. If tmp is an array (or vector) of size m and the number of iterations n, the loop should run in memory on the order of m * n. Naive ref counting algos aren’t able to reclaim the vector tmp until the end of the scope function even though it’s no longer use after a copy is made. Whereas a tracing GC can collect that space earlier.

So I re-wrote the Toccata version to more closely match the intention

(defn loop [tmp n]
  (assert (instance? Vector tmp))

  (either (and (< n 0) (maybe tmp))
          (let [tmp2 (loop (map tmp identity) (dec n))
                tmp2 (either (for [first-num (get tmp2 0)
                                   tmp2 (store tmp2 0 (inc first-num))]
                               tmp2)
                             tmp2)]
            tmp2)))

However, the analysis of the generated code doesn’t change. tmp is still collected before the call to map returns because its ref count is 1 before the call. So the Toccata version only ever has, at most, 2 copies of the vector being manipulated at once. So this version runs in m * 2 memory. And I think I could make the runtime smarter so that it resuses the space occupied by tmp in the call to map so that it would run in memory on the order of size of the vector.