Toccata

It’s time to show off a new aspect of Toccata. Scripting.

Following the Script

Toccata was originally envisioned to be a compiled language that produced executables to run as native apps. That process looks like

Parse Toccata source code to generate a Abstract Syntax Tree (AST)
Use the AST to do as much static type checking as possible
Walk the AST to generate C code
Compile the C code using LLVM or GCC to produce a native executable
Profit

As I was implementing all this, I realized I could do something else with the AST. I could interpret it directly without generating and compiling C code. Obviously, the performance is much less. But depending on the script, it could start executing immediately. For little utilities that do tasks on “human scale” time, this could be very useful. I already have the AST, so how hard could it be to add the interpreter?

Moving mountains

Yeah, it was that hard. A basic LISP interpreter is pretty easy. But once you start adding things you need for real life, the complexity jumps by leaps and bounds. And I’m pretty sure there are still loose ends to clean up. But it’s close enough I can share it.

Since the compiler already had the code to produce the AST, I just added a module to interpret it, so the same executable does double duty. You just have to add the --script option before the Toccata file name. So let’s look at a script I put together.

 1  #! /home/jim/toccata/toccata --script

 2  (defprotocol Write
 3    (write [_]
 4      ""))

 5  (defn extract-doc [ast]
 6    (-> ast
 7        .doc
 8        .lines
 9        (remove empty?)
10        (interpose "\n")
11        to-str))

12  (extend-type ast/fn-ast
13    Write
14    (write [ast]
15      (for [sym (.fn-sym ast)
16            doc-str (-> ast
17                        .arities
18                        first
19                        (map extract-doc))]
20        (file/stdout ["\n" (str sym) "\n" doc-str "\n"]))))

21  (extend-type ast/definition-ast
22    Write
23    (write [ast]
24      (map (.value-exprs ast) write)))

25  (extend-type ast/prototype-ast
26    Write
27    (write [ast]
28      (let [lines (-> ast
29                      .doc
30                      .lines
31                      (remove empty?))]
32        (file/stdout [(str "\n" (.fn-name ast) "\n")
33                      (extract-doc ast)
34                      "\n"]))))

35  (extend-type ast/protocol-ast
36    Write
37    (write [ast]
38      (map (.prototypes ast)
39           (fn [prototype]
40             (write (.fn-name prototype (str (.protocol-sym ast) "/"
41                                             (.fn-name prototype))))))))

42  (def extract-asts (parse/parser (grammar/none-or-more
43                                   (map reader/top-level write))))

44  (main [[_ file-name]]
45    (extract-asts {'file-name file-name
46                   'line-number 0}
47                  (file/slurp file-name)))

This script reads a Toccata source file and extracts the names and docstrings for top level functions from their AST’s. Let’s go through it line by line.

As with any shell script, the file starts with a shebang specifying the path to the compiler executable and any options. The name of the file gets appended to this command line and passed to the script.

Lines 2 - 4 is a standard Toccata protocol definition. The write protocol function is what will extract the needed info and output it. It does nothinng by default, so any value that it’s called on that doesn’t explicitly implement it will do nothing.

Lines 5 - 11 is a function that will extract the doc string from a given AST node. Any symbol preceded by a “.” is a data type field getter/setter. So

(.doc ast)

is how you get the value of the doc field of the ast value. Doc strings are the first comment block in the body of a function. Like this

(defn some-fn [x y]
  ;; This is the doc string
  ;; and it might span multiple lines
  (do-something-amazing-with x y))

The -> expression is the same as in Clojure. It threads the result of each step into the first argument slot of the next, adding parentheses where needed. So

 (-> ast
     .doc
     .lines
     (remove empty?)
     (interpose "\n")
     to-str)

is equivalent to

(to-str (interpose (remove (.lines (.doc ast)) empty?) "\n"))

but easier to follow what’s going on. For the AST nodes we’ll call extract-doc with, the doc field contains an ast/block-comment-ast value which has a lines field. This field is a Vector of strings. So this function extracts that vector, removes any empty strings, sticks a newline between the lines and constructs a single string value.

Included namespaces

Lines 12 - 20 implements write for a function AST node. This requires a side track.

One of my main goals in Toccata was to make the compiler modular and make those modules accessible to Toccata programmers so they could easily write tooling. These modules are brought together in the compiler.Since the compiler already has these avavilable, it would be absurd to require the interpreter to load new copies of them when interpreting scripts. Though it turns out, this would have been much, much easier.

There are a number of these modules provided to scripts as ‘pre-loaded’ namespaces that can be referenced using prefixed symbols. The ones used in this script (with their prefixes) are:

file Reading and writing files.
grammar DSL for building grammars for parsers
reader The parser grammar for the Toccata language
parse Library for converting a parser grammar into a simple recurseive descent parser
ast Namespace that specifies the various nodes in a Toccata AST `

Back on the trail

With that in mind, lines 12 - 20 extend the ast/fn-ast AST node type so that it extracts the information from the node and sends it to file/stdout. Breaking this down, line 15 is the beginning of a for comprehension. This is similar to Clojure’s ‘for’ macro for sequence comprehensions, except in Toccata any data type that implements the flat-map and wrap core protocol functions can be used in a comprehension. In this case, it’s the Maybe data type.

Looking at the type definition of the fn-ast type, you’ll see it has such a field and also one named arities. And the fn-sym field is required to be of a Maybe type. So line 15 extracts this value. If it’s not nothing, that is, there’s a value inside it, this inner value is extracted and bound to sym. Otherwise the comprehension quits early. Then on line 16, the ‘->’ expression gets the arities field, which is a vector of ast/fn-arity-ast values and gets the first one. The first function always returns a Maybe value that contains the first arity if the vector isn’t empty. The call to map then applys extract-doc to that arity value, returning a string inside a Maybe. This inner string is then extracted by the rules of the comprehension and bound to doc-str.

Finally, on line 20, the various strings are put into a vector and fed to file/stdout which writes them out. I’ll have more to say about STDIN and STDOUT in a future post.

And whew, lot of verbiage for a short function. The rest will go quicker.

Depends on the definition

Lines 21 - 24 implement write for the definition AST node. An ast/definition-ast node has two fields; sym and value-exprs. The value expressions might be any number of comment blocks with one expression that actually produces a value in there somwhere. And we’re only interested in the case where an ast/fn-ast node is in there, so we map over the vector of expressions and apply write to each one.

Hopefully, you can see how protocol definitions are handled in lines 25 - 41

Parsing your words carefully

And now we come to actually parsing a source file. Lines 42 and 43 build the parser. Starting from the inside, reader/top-level is the grammar that specifies all the possible expressions that can appear at the top level of a Toccata source file. In Toccata, map can be implemented for any data type, not just collections. Ordinarily, reader/top-level specifies that an AST node is the result of parsing a top level expression. However, calling map on it creates a new grammar that applys the write function to the parsed AST node and produces the result when a top level expression is parsed. But in this case, write sends strings to STDOUT, so we don’t really care about the result. Yes I know it’s not pure, so sue me.

Then grammar/none-or-more takes the grammar to parse (and write) a single expression and produces a new grammar that says “parse until you can’t parse no more”. Finally, parse/parser produces an actual parsing function and assigns it to extract-asts.

Mainly

And now, the top level, the main event, where it all comes together and the work gets done. And it’s very anti-climatic. The parameter to the main function is a list of strings that come from the command line. The first value is always the name of the file being interpreted. In this case, this list is destructured and the second value, which should be the name of a Toccata file, is bound to file-name. This file is slurped into a string which is passed to the parser along with a hash-map of values the parser uses to track progress. We don’t really use that info in this application, but might later.

And that’s it. Next time, I intend to show how extend this script to produce an HTML file of documentation.

Recent Posts

Links