For Arguments Sake

In a previous post, I very briefly dipped into passing arguments to a Toccata program on the command line. But we need to be able to do much more than just grab a string from the command arguments. We need to convert said arguments (which are strings) to actual data we can use to customize how our programs run. Many languages have custom libraries to handle command line arguments, but I prefer to have a single 'canonical' way of handling parsing in Toccata. So I've tried to make the parsing library I've been discussing easy enough to use for the command line arguments.

The first thing to know is that the single argument passed to the main function is actually a list of strings. The first string is the command that was run (regardless of what program actually is executing) and the rest are the command line arguments. This has the effect of removing any whitespace separating the arguments on the command line.

This leads to the second thing. The parser generated by rd/parser will happily parse a list of strings just as easily as a single string. So parsing command line arguments is just a matter of specifying a grammar for the arguments, generating a parser from it, and then passing the arguments list to said parser.

But there's a wrinkle. Depending on how you want to parse your arguments, you may need some way of separating one from the next. This is what spaces are used for. But they've all been removed by the shell, so they will need to be put back in. I prefer to have spaces in there by default, so that's what I'll show below.

Nuts and Bolts

We (or at least I) need a program that will take an integer and a name and print a greeting the desired number of times. Here's about the simplest way I could think of to do this in about 2 minutes of work.

(defn print-msg [n msg]
  (either (= 0 n)
          (do
            (println msg)
            (print-msg (dec n) msg))))

(main [args]
      (let [[_ n name] args]
        (print-msg (str-to-int n)
                   (str "Hello, " name))))

The function print-msg is a classic example of how to do explicit loops in Toccata (as opposed to implicit loops using map, reduce, etc) using recursion and an either expression.

In the main function is where things are brittle. args is the list of command line arguments. For this to work, we assume a number of things. That it will have at least 3 elements, that the second element is a string containing only decimal integer digits, etc.

This does exactly what we wanted and is perfectly fine for a one-off task or something that I alone would use. It works if you give it only the correct inputs, but stray from that happy path and things blow up. Often with incomprehensible error messages. So let's make this more resiliant using the parsing libraries we've already seen in action.

Parsing the command line

First, the boilerplate.

(add-ns rd (git-dependency "https://github.com/Toccata-Lang/recursive-descent.git"
                           "recursive-descent.toc"
                           :sha "882b014"))
(add-ns grmr (git-dependency "https://github.com/Toccata-Lang/grammar.git"
                             "grammar.toc"
                             :sha "7690cd3"))

And the function that actually does the work, again

(defn print-msg [n msg]
  (either (= 0 n)
          (do
            (println msg)
            (print-msg (dec n) msg))))

Like most software, the actual functionality is quite small, it's all the supporting code that takes most of the time and effort. First up is a grammar rule that parses a string of digits and produces a positive integer.

(def integer (map (grmr/one-or-more grmr/digit)
                  (fn [digits]
                    (str-to-int (to-str digits)))))

Hopefully this is understandable at this point. If one or more digits are parsed from the input, the list of digit characters is converted to a single string and that string is then passed to str-to-int, producing the integer from the command line parameter. Anything else will fail.

Next, we need to parse the name parameter.

(def name (map (grmr/one-or-more grmr/alpha)
               to-str))

This restricts the name to only be alpha characters. (My apologies to anyone with apostrophes or other non-alpha characters in their name.) The list of characters parsed are then converted to a string by to-str.

And now, putting the grammar rules together into the full grammar.

(def parse-and-print (grmr/apply-fn (fn [n name]
                                      (print-msg n (str "Hello, " name)))
                                    integer
                                    (grmr/ignore " ")
                                    name))

This introduces a new concept, gmrm/apply-fn. This grammar combinator takes a function and a number of grammars. In this case, there are 3 grammars; integer, an unnamed one for a single space, and name. These 3 grammars are eached applied to the input, one after another. If all 3 successfully parse their piece of the input, the results of each of them are passed to the anonymous function which is the first value passed to grmr/apply-fn. Except the grammar that parses a single space is created with the grmr/ignore combinator, so it's result is discarded, which is why the anonomous function only takes 2 parameters and not 3.

Of those 2 parameters, n is the result of the integer grammar rule and name is the result of the name grammar rule. The anonomous function then builds the greeting string and calls print-msg.

But parse-and-print isn't a function that can be called. It's just a grammar, a data structure that describes the command line arguments and what to do with them. We need to create a parser from it as we've seen before.

(def parser (rd/parser parse-and-print))

And now parser is a function that will take a command line, parse the arguments and call print-msg if all arguments are of the correct form. Let's wrap this up with the main entry point.

(main [cmd-args]
      (either (-> cmd-args
                  rest
                  (interpose " ")
                  parser)
              (print-err "Invalid arguments.")))

Quite a few things happen in these few lines of code. First off, we take the rest of cmd-args to get only the command line arguments. Then we put a single space between them to match the grammar we declared above. This list of strings is then passed as input to the parser. If everything parses correctly and the greetings are printed, the value returned is a Maybe value containing the results of the parser (which we're ignoring for now). But if the parsing fails, the either expression falls through to the print-err expression to print a simple error message.

And that's it. This program is in the examples directory here.

You can run it using the run script in the project root.

bash-3.2$ ./run examples/number-option.toc 4

*** Invalid arguments.

The run script puts the executable into a file name m. The name doesn't mean anything, I just picked it at random. You can run the executable directly once it's been compiled.

bash-3.2$ ./m 3 Pop
Hello, Pop
Hello, Pop
Hello, Pop