We're rolling now
I don't know about you, but I'm really pleased at how this blog series is developing. Each post seems to be building on the last by adding a couple of new pieces. I would love to say I intended it that way, but I can't.
In this post, we're going to begin looking at file I/O in addition to parsing.
More than command lines
So far, we've seen how to parse the command line arguments passed to our
main function. This could be extended to create a full blown arguments parsing library with long and short options, etc. But at some point, we're going to need more than what can be conveniently passed in on the command line. We're going to want to keep a file full of configuration information that will be read at each invocation and its contents used to inform our program's execution.
This turns out to be just a tiny step beyond what we've already seen. I'm of the opinion that config files should be relatively simple. If you need to use JSON or XML for nesting complex data structures, you're getting close to needing a full-blown DSL. And what you're writing is more of an interpreter than just a program with a config file. So for our purposes, we're going to limit the lines of a config file to
- empty lines (only containg spaces and tabs)
- comment lines beginning with '#'
- value lines that have a name (alphabetic characters and '-') and a value (integers or strings delimited by '"')
and that's it. So let's do this.
First, import the needed libraries
We've added the file I/O library to our list of imports. This lib contains some functions to do basic reads and writes of files.
Now let's start describing our grammar for the config file lines. Here's an empty line.
We start by declaring rules for white space and a new line character. Then define the
Now, we declare a comment
grmr/not-char combinator does what it says. It matches any charactor other than the one given. Now, let's bring over
integer from the last post.
A new requirement is reading in strings delimited by double quotes. But we have all the tools to do this easily.
And now, the rubber meets the road. We need to combine these pieces to parse a config file line and create a data structure of the name string and config value. A moment's thought surfaces some requirements.
- We don't know which names will be defined in the file
- We don't know what order they will be defined in
- We want to end up with a HashMap of all names to values in the file
There are multiple ways to do this. My preferred way is to put each name/value pair into a HashMap and then just compose all the maps after parsing all the lines.
(side note: I left that anonymous function in to show the param and value. But replacing it with
hash-map works just fine.)
The final piece is to pull it all together to declare a grammar that will parse an entire config file
Each line gets converted to a HashMap. But we need to ignore the comment and empty lines. So we define a 'higher order' grammar rule. (Really, it's just a function that takes a grammar rule and returns a modified version.) In this case,
ignore takes a grammar a rule and returns a rule that always returns an empty HashMap upon a successful match. This is different from and does not replace
grmr/ignore from the grammar library.
And then, it's straightforward to define our config file grammar. It's just none (since the file may be empty) or more lines. Each line may be either a comment or an empty line, which produces an empty HashMap. Or a configuration name/value pair.
This list of HashMaps is passed to the anonymous function as the parameter
config-lines. Then we use the parametric form of
comp* function to squash all the HashMap into a single HashMap. This also means that if a config name is defined multiple times in a file, only the last one will appear in the final HashMap.
All that's left is to read in the config file and parse it.
I'm going to leave the explanation of the
main function out. You should be able to read this if you've read the previous posts. I will say that
fio/slurp just pulls the entire contents of a file into a string in one shot.
So, if you have a config file that looks like this
When you run this program, you should get
Though the order of the lines may be different.
Better than a library
In other languages, there might be a library or package that you would import to handle config files. Open source is great, but when you pull in a package like that, you take on the responsibility for keeping it integrated with your application and that comes at a cost. What I've tried to show here is how easy it is to have custom software to do exactly what you need.
For example, it would be trivial to only allow certain names to be in the config file. Or add different kinds of values. I think this is much superior to importing a bunch of libraries to do relatively simple tasks.
What if you really do want to parse JSON?
That's up next