Toccata
Feb 25 - Counting References FTW Comments

Reference Counting can beat Tracing Garbage Collection (such as mark-and-sweep) in some circumstances. Now that the click-bait headline is out of the way, and it got you here, let’s see what I mean.

How did we get here?

A couple of months ago, I narrowly avoided getting nerd-sniped by this Jon Harrop tweet.

But then he did this

and I was sunk. So I grabbed the Ocaml repo and ran it on my machine for a target to hit.

  Command being timed: "../deriv-ocaml/a.out 10"
  User time (seconds): 1.56
  ...
  Maximum resident set size (kbytes): 432784

Toccata uses a ref counting system I came up with and I thought it should do pretty well. So I ported the Ocaml code to Toccata. (The repo is here) Aaaaannnnd it took 10x the memory and 10x the time. This didn’t actually surprise me, since I’d done no optimizations on the Toccata runtime. However, in short order I had the memory consumption down. The execution time took more effort, but evenutally did ok. Here are the results.

  Command being timed: "./deriv 10"
  User time (seconds): 3.03
  ...
  Maximum resident set size (kbytes): 299364

Let the Caveats Begin

Of course, it’s not quite so straightforward. I had to do some violence to the Toccata runtime to get these results.

I’ve no idea how that stacks up to the optimizations the Ocaml code uses.

The Magic Behind the Numbers

So how does this ref counting scheme work, anyway?

Here’s the function that adds two integers from the standard library:

(defn add-numbers [x y]
  (inline C Integer "
    Value *numVal = integerValue(((Integer *)x_0)->numVal + ((Integer *)y_1)->numVal);
    dec_and_free(x_0, 1);
    dec_and_free(y_1, 1);
    return(numVal);"))

These functions with inline C code are the leaves of the call tree in a Toccata program. Every call to any function bottoms out in a function like this.

You see those calls to dec_and_free? When this function is done with each parameter, it decrements the references on them and frees the memory if the refs hit zero, meaning they’re not referenced by any other value. All functions that have an inline C body have to do this.

The rule is:

Now let’s look at a function from the Toccata derivation code.

(extend-type Add
  Operations
  (d [a x]
    (let [f (.f a)
          g (.g a)]
      (Add (d f x) (d g x)))))

This is converted to the following C function. Hopefully, the correspondence between Toccata symbols and C symbols is obvious enough.

Value *g_Add_d_4545(List *closures, Value *a_0, Value *x_1) {
  while (1) {
    incRef(a_0, 1);
    incRef(x_1, 1);

    // call proto fn .f
    // #line 138 "deriv.toc"
    Value *rslt2 = g_Add_f_4052(empty_list, a_0);
    Value *f_3 = rslt2;

    // call proto fn .g
    // #line 139 "deriv.toc"
    Value *rslt4 = g_Add_g_4056(empty_list, a_0);
    Value *g_5 = rslt4;

    // static-fixed
    // #line 140 "deriv.toc"
    Value *rslt6 = g_disp_d_4467((List *)strPtr313, f_3, x_1);

    // static-fixed
    // #line 140 "deriv.toc"
    Value *rslt7 = g_disp_d_4467((List *)strPtr313, g_5, x_1);

    // call invoke
    // #line 140 "deriv.toc"
    Value *rslt8 = g_arityFn4001(empty_list, g_reifiedPtr4026, rslt6, rslt7);
    return(rslt8);

    Value *let_rslt9 = rslt8;

  };};

You can see that a_0 is passed as a parameter to two different functions, as is x_1. The compiler knows that their refs will be decremented by one in each call. But this function can only have the effect of decrementing their refs by one, so it increments the refs of each by one before executing the body of the function. So if a_0 has a ref count of 1 when passed to this function, it will be inc’d to 2, then dec’d by g_Add_f_4052. Then dec’d again by g_Add_g_4056 where it will hit 0 and the memory freed because the value is no longer used.

Now look at the value returned from g_Add_f_4052 that gets assigned to f_3. Presumably, it is newly allocated. It then is passed as a parameter to one function, g_disp_d_4467. Which leads to the second rule:

Finally, look at the call to g_arityFn4001. This function returns the value that will be the final result. Again, it was presumeably newly allocated. But what if the return value was one of the parameters passed in? That’s covered by the 3rd rule:

There are some other rules for special cases, but this post is growing a little long. Hit me up for details if you care.

But what about …

cycles in the reference graph where values refer to themselves. This is one major issue with reference counting. Fortunately, Toccata can ignore this since all values are either atomic, or are immutable data structures which are never modified. This means that it’s impossible to create cycles. (Except for one loophole I’ve thought of, But if you do that, you’re a bad person and I’m not going to worry about it.)

Finally

By counting references like this, all values get freed the instant they’re no longer needed. Which also means you can tie other kinds of resources, like file handles or network sockets, into the garbage collection mechanism and their reclamation is dealt with automatically.

It also means that the runtime can know for sure when a datastructure can be safely mutated rather than copied. This means something like Clojure’s transient data structures are unneeded.

And I think it could let me implement linear types easily. When a value is marked as linear, it’s ref count could never be greater than 1 or there’s an error. So it should never be passed to incRef.

And that’s how Toccata manages memory.