Mar 13 - Ref Counting Latency Comments

To wrap up this series on Toccata’s memory managment, we turn to latency.

A different kind of post

This post is going to be different in that it’s more hypothetical, as I don’t have time to make sure all the code works and post actual results. Though I might be talked into it if someone wanted that.

Collecting memory that’s no longer used takes time. One of the reasons I wanted to do reference counting for Toccata is to eliminate instances where your program just stops while the collector does it’s work. I’m aware there are schemes which can reduce greatly (almost to the point of elimination) these pauses. But I figured ref counting would be better given the way memory is used in Toccata.

The ‘latency’ of an operation is the measure of this overhead and there are multiple ways to look at this statistic. In a lot of cases, it’s sufficient to just have a low average latency even if occasionally there’s a really long one. However, many times, the worst case latency must absolutely be kept below a certain number so that no user ever sees too long of a delay.

This project purports to compare the latencies of various languages. It hasn’t been updated in some time, so the results aren’t valid anymore. However, the benchmark is interesting. It involves a process that listens for TCP connections. When it gets one, it allocates a 1K buffer and initializes all the bytes in it to a given value. Then it puts that value in a hash-table that has up to 250K entries, evicting an old value if needed.

And now, Toccata

How might this be done in Toccata?

(defn allocate-buffer [value]
  (inline C "
  // some awesome C code

(defn handle-connection [socket hm index]
  (let [new-connection (extract socket)
        new-index (either (< (inc index) 250000)
        new-hm (assoc hm index (allocate-buffer 89))]
    (send new-connection "200 OK\n")
    (handle-connection socket new-hm new-index)))

(main [_]
  (handle-connection (tcp/listen 8080)

This is just an infinite loop that does the deed for each connection that comes in and a main to kick things off.

Under the covers

The only interesing bit is what happens in that call to assoc. Hash maps are trees of internal data structures (with 32-way branching). The key value, index, gets hashed (but since it’s already a number, it’s just used as is) and then 5 bits are used at each level to determine which branch to take. When a leaf position is reached, the tree node containing that position is copied with tne new value being added (or replacing an existing leaf). And then all the nodes back up to the root get copied with new entries for the sub trees.

Except. Each tree node and each leaf value has a refernce count in it. And since this hash map value is only used once in this function, it’s ref count and all the ref counts of the internal tree nodes and leaves are always going to be 1. The runtime knows this and so instead of copying any of the structures, it just mutates the existing structures by writing the memory address of the new value in to the leaf position. If there’s an old value there, it frees that value first. So there’s no copying.

So for each connection, there’s basically an allocation, 4 bit-shift operations, a handful of array accesses, and a deallocation that’s needed. And maybe some other constant overhead. The point being, that there’s very little work to do and very little variation from worst case to best case for the latency.

It would be hard to find a benchmark that plays to Toccata’s strengths better than this one. Like all ref counting schemes, you might get a long pause if you build up a very large data structure and then it gets completely discarded. But that is going to be exceedingly rare given how the Toccata data structures work. And it would only tie up one thread if it did happen.

That’s about all the time I want to take for internal stuff, though I’m happy to explain other aspects if people post questions in the comments. Next time, I’d like to get a lot more practical.