Changing Direction

I hope you all had a very Merry Christmas.

For a long time, I've been writing posts about Toccata. I wanted a body of material that explained the philosophy and capabilities of Toccata. But now we change direction and instead of talking about Toccata, we're going to start using Toccata. Starting with the basics.

When I was young ...

and just learning to program, I remember reading an article/program in a computer magazine explaining something called 'parsing'. I didn't even know what 'parsing' was, but it was an interesting program to play around with. There's been a massive amount of research about parsing in the years since and many advances. Parsing a sequence of bytes into structured data is still one of the fundamental programming tasks today. And yet, hand rolling a parser with imperative code is hard to write, debug and verify. Many libraries exist in almost every language to help programmers and Toccata has it's own. Here's a short program to parse a hard-code text string.

(add-ns rd (git-dependency "https://github.com/Toccata-Lang/recursive-descent.git"
                           "recursive-descent.toc"
                           :sha "882b014"))

(def parser (rd/parser "one"))

(main [_]
      (println "1:" (parser "one"))
      (println "2:" (parser "two")))

This is a significant chunk of code, but we'll be building on it, so I'm going to explain it all line-by-line. You can copy-and-past the above to a file and use the run script to run it.

(add-ns rd (git-dependency "https://github.com/Toccata-Lang/recursive-descent.git"
                           "recursive-descent.toc"
                           :sha "882b014"))

This expression is how to import a Git repo as a dependency. There is no central package repository. Instead any Git repo can be a dependency. In this case, the file recursive-descent.toc is the file imported from the recursive descent repo. This expression also states that any symbol imported from recursive-descent.toc must be prefixed by rd/ when referenced in this file.

(def parser (rd/parser "one"))

This is how a recursive descent parser is created from a grammar, in this case, the simplest possible one. Couldn't be easier. This dependency has the definitions to produce a recursive descent parser from a grammar. I intend to provide other kinds of parsers eventually.

(main [_]
      (println "1:" (parser "one"))
      (println "2:" (parser "two")))

And here we have 2 examples of using that parser to parse 2 different strings. The output generated from this short little program is

1: <maybe one>
2: <nothing>

If the parser successfully matches the string passed to it, the default is to just return that string wrapped in a Maybe value. But if it doesn't match the string, nothing is returned. That's enough to get us started.

(add-ns rd (git-dependency "https://github.com/Toccata-Lang/recursive-descent.git"
                           "recursive-descent.toc"
                           :sha "882b014"))
(add-ns grmr (git-dependency "https://github.com/Toccata-Lang/grammar.git"
                             "grammar.toc"
                             :sha "7690cd3"))

(def grammar (grmr/any "one"
                       "two"
                       "three"))

(def parser (rd/parser grammar))

(main [_]
      (println "1:" (parser "one"))
      (println "2:" (parser "two"))
      (println "3:" (parser "three"))
      (println "4:" (parser "four")))

Literal strings are just one of several 'simplest' grammars possible. The grammar.toc file has more as well as some functions for composing simpler grammars into more complex ones.

(def grammar (grmr/any "one"
                       "two"
                       "three"))

And here we have a simple grammar. In this case, 3 terminal grammars are composed into a single grammar with the grmr/any function. This composed grammar will match any of the three strings, but nothing else. We'll see this in action in a second.

(def parser (rd/parser grammar))

And we create a parser from the given grammar

(main [_]
      (println "1:" (parser "one"))
      (println "2:" (parser "two"))
      (println "3:" (parser "three"))
      (println "4:" (parser "four")))

And here we have 4 examples of using that parser to parse 4 different strings. The output generated from this short little program is

1: <maybe one>
2: <maybe two>
3: <maybe three>
4: <nothing>

No big deal.