To The Core

Clojure has gotten a lot of mileage out of the Sequence Abstraction. This blog post does a pretty good job of introducing the concept. Also, Zach Tellman gave a great talk On Abstraction that lays out why a good abstraction is valuable.

As I used Clojure over the years, I came to realize there were opportunities for abstractions other than sequences. So Toccata's core library defines a handful of protocol interfaces that most of the primitive data types implement to varying degrees. Not every data type implements all the protocol functions. Only those that make sense for that data type.

Primitive Types

Before looking at the core protocols, we'll take a quick look at a few of the primitive types that implement them. This is not a complete list, but it's enough to get us started.

Integer

Currently, the only numeric type are integers. Floating point numbers will be added later. All value types in Toccata are 'boxed'. That is, internally, the actual 64-bit number is wrapped in a structure that contains data about the number (so called 'meta-data'). This meta-data is not accessible from Toccata code, so it's invisible. But it does have an impact on performance. Eventually, I'd like to use static analysis to eliminate that overhead.

String

A string is just a sequence of characters interpreted as text.

Symbol

A symbol is basically a string value that implements fewer protocol functions.

Maybe

Maybe values are the simplest type of values that may, or may not, contain another value of any type. In this case, it may only contain a single value. There is a unique Maybe value that is empty, the nothing value.

List

A list may contain 0 or more values of any type. In general, the members of the list do not all have to be the same type. It's a heterogeneous data structure, as are most of the core data structures. When an item is added to a list, it appears at the head of the new list. You can only access a value inside a list by walking through all the items ahead of it.

Vector

A vector is very similar to a list in that it can contain 0 or more items of any type. However, when you add an item to a vector, it goes at the end of the new vector. Also, you can access any item in a vector by providing an integer index.

Function

Since Toccata is a functional programming language, functions themselves are first class values.

Hash Map

Your good ole associative collection that maps key values to values. Keys can be anything that is hashable. Values can be anything.

The sharp eyed Clojure programmer might have noticed there's no keyword type mentioned. I've come to believe that while keywords were pretty cool in Clojure, their use is actually an anti-pattern. So I chose not to include them in Toccata. If you feel like you absolutely cannot live without them, symbols give you all the same functionality. But you'll just give yourself headaches if you use them like that.

Protocols

Now, let's turn our attention to the core abstractions. The concept of 'protocols' was introduced in Clojure to address the "Expression Problem". Chris Houser gave a great talk on this.

Protocols let you define the names and arities (the number of arguments a function takes) of one or more functions. A data type will implement those functions and when a protocol function is called, the type of the first argument determines which implementation actually gets called. The fancy term for this is 'polymorphic dispatch'.

In Toccata, the plan is to go a step further and allow you to specify constraints that the protocol implementations must satisfy and then check those constraints at compile time, where possible.

While there are others, there are 6 main protocols defined in core.toc. I'll go into details on each of these in future posts.

Composition

definition

The simplest of the core protocols. It allows two or more values of a type to be combined to produce a new value of that type. An example is list concatenation.

Container

defintion

The most complicated of the core protocols and also the most useful. These functions all deal with types whose values contain other values with the focus being on working with the containing values.

Collection

definition

A set of functions for working with values that contain other values. Mostly focused on the contents, rather than the enclosing value.

Seqable

definition

This is the parallel to the Sequence Abstraction from Clojure. These functions are for working with Collections that also have a concept of an order to their contents.

Indexed

definition

For collections whose contents can be accessed with an integer-valued index.

Associative

definition

For Collections that have a concept of associations between keys and values.

Next

As I said, I'll go into detail about each of those protocols very soon. But before I do, there's one other aspect of Toccata I want to cover that was one of the most speculative, but that I was very pleased with how it turned out.