Why Sponsor Oils? | blog | oilshell.org
I realized that there's an obvious term for what I wrote about in the last post -- static parsing, as opposed to dynamic parsing.
This is analogous to static typing vs. dynamic typing. This distinction describes whether you know the types of variables before you run the program (at compile-time), or only while you run it (at runtime).
In a statically parsed language, you know the parse tree up front, as well as the presence of parse errors. In a dynamically parsed one, you don't know until runtime.
Java is statically parsed and statically typed. Python and JavaScript are statically parsed but dynamically typed.
I suppose it's impossible to be statically typed and dynamically parsed, because you need a parse tree to check types. But I think this term is fairly obvious and useful, at least for the purposes of talking about the shell language. I will rephrase my previous posts in these terms:
[
is dynamically parsed, while [[
is
statically parsed.
The shell language is mostly statically parseable, but most implementations don't statically parse it.
The bash extension of associative arrays can't be statically
parsed. What's inside array[]
must be dynamically parsed.
A few more useful applications of this term:
Perl 5 and Make can only be dynamically parsed. (Whether they have a useful statically parseable subset is an interesting question.) According to Larry Wall, an explicit goal of Perl 6 is to make it statically parseable, in contrast to Perl 5.
We said that Python is statically parseable, but there is a subtlety. You
can statically parse a single .py
file. But a program generally consists
of many modules and packages, but what those are can't be statically
determined. So you can't produce a parse tree for a Python program up
front, because you don't even know what files it's composed of!
This has practical consequences for packaging tools. (At Google,
Python dependencies are repeated inside "static" BUILD
files, basically for
this reason.)
I plan to add a static version of "source" or "import" to oil, also for this reason.
C seems like it's statically parseable, but real C code makes extensive use
of the preprocessor. Most preprocessor usage "fits into" the parse tree, but
some of it definitely changes the structure of the program -- in particular,
macros without balanced braces or parens. This kind of problem comes up if
you've ever tried to parse headers with a lot of #ifdefs
, e.g. libc
headers.
Back to the main point: Intuitively, I would equated "dynamically parsed" with "70's style macro language", and "statically parsed" with "language implemented using a real parser".
What I think is surprising is that, except for an extension introduced with bash 4.0 in 2009, shell is statically parseable. I think many people are under the impression that it's an ugly and unparseable macro language. The syntax is definitely unusual, but it is relatively well defined and statically parseable.
In the next few posts I will show some of the ugly corners of shell syntax that I've hit while developing my parser.