New Terminology: Static Parsing vs. Dynamic Parsing

2016-10-22

I realized that there's an obvious term for what I wrote about in the last post -- static parsing, as opposed to dynamic parsing.

This is analogous to static typing vs. dynamic typing. This distinction describes whether you know the types of variables before you run the program (at compile-time), or only while you run it (at runtime).

In a statically parsed language, you know the parse tree up front, as well as the presence of parse errors. In a dynamically parsed one, you don't know until runtime.

Java is statically parsed and statically typed. Python and JavaScript are statically parsed but dynamically typed.

I suppose it's impossible to be statically typed and dynamically parsed, because you need a parse tree to check types. But I think this term is fairly obvious and useful, at least for the purposes of talking about the shell language. I will rephrase my previous posts in these terms:

[ is dynamically parsed, while [[ is statically parsed.
The shell language is mostly statically parseable, but most implementations don't statically parse it.
The bash extension of associative arrays can't be statically parsed. What's inside array[] must be dynamically parsed.

A few more useful applications of this term:

Perl 5 and Make can only be dynamically parsed. (Whether they have a useful statically parseable subset is an interesting question.) According to Larry Wall, an explicit goal of Perl 6 is to make it statically parseable, in contrast to Perl 5.
We said that Python is statically parseable, but there is a subtlety. You can statically parse a single .py file. But a program generally consists of many modules and packages, but what those are can't be statically determined. So you can't produce a parse tree for a Python program up front, because you don't even know what files it's composed of!

This has practical consequences for packaging tools. (At Google, Python dependencies are repeated inside "static" BUILD files, basically for this reason.)

I plan to add a static version of "source" or "import" to oil, also for this reason.
C seems like it's statically parseable, but real C code makes extensive use of the preprocessor. Most preprocessor usage "fits into" the parse tree, but some of it definitely changes the structure of the program -- in particular, macros without balanced braces or parens. This kind of problem comes up if you've ever tried to parse headers with a lot of #ifdefs, e.g. libc headers.

Back to the main point: Intuitively, I would equated "dynamically parsed" with "70's style macro language", and "statically parsed" with "language implemented using a real parser".

What I think is surprising is that, except for an extension introduced with bash 4.0 in 2009, shell is statically parseable. I think many people are under the impression that it's an ugly and unparseable macro language. The syntax is definitely unusual, but it is relatively well defined and statically parseable.

In the next few posts I will show some of the ugly corners of shell syntax that I've hit while developing my parser.