Why Sponsor Oils? | blog | oilshell.org
Here is how you print the AST for a shell script using the current prototype:
$ sketch/pysh.py --no-exec --print-ast -c 'ls ~/src' (Com {[LIT_CHARS ls]} {[TildeSub ''] [LIT_CHARS /src]})
The AST contains a single command node: (Com ...)
. It has two words, both of
which are enclosed in { }
. The first word has a literal token ls
. The
second one has a TildeSub
part and then a literal /src
part. LIT_CHARS
is a token type.
I can test the parser out on any shell script, and the first thing I did was parse ~2500 lines of my own scripts. But what's more interesting is to run it on scripts written by someone else. Shell is a little like C++ -- everybody writes in their own subset of it.
I tried parsing debootstrap and Aboriginal Linux, two shell codebases that are thousands of lines long (and teach you about the foundations of a Linux system.)
They both revealed bugs and missing features in my parser. In the last few days I've fixed all the issues Aboriginal Linux hit, including:
ksh
-style function defs: Aboriginal uses function foo { }
in addition to
the POSIX foo() { }
$[1 + 2]
in addition to $((1 + 2))
<(echo hi)
expands to /dev/fd/63
echo "${foo#"$i"}"
. This syntax doesn't make much sense to me, but real
programs use it.$(( $# + 1 ))
&&
.For example:
ls / && ls /bin || echo ERROR
Another tricky issue I dealt with was with expressions like $((1 * (2+3)))
.
I would write this as $((1 * (2+3) ))
, to make it clear that the first right
paren is for expression grouping, and the last two right parens close the
substitution.
But this isn't required, so tokenizer has to take care not to interpret the
first two of three ))
as closing the arithmetic substitution.
RESULTS: This directory contains the source code and abstract syntax trees for Aboriginal Linux, totalling 3,770 lines of code.
It works! But I also claim it's better than existing shell parsers. Tomorrow I will talk about why this is.