Why Sponsor Oils? | blog | oilshell.org
I fixed the bugs that prevented debootstrap from parsing. Here are the
ASTs and line counts. So now I have three
projects parsing: Aboriginal Linux, /etc/init.d
and debootstrap.
The issue that was blocking it was this syntax:
echo >out.txt 1 2 3
I always thought the redirect had to go last:
echo 1 2 3 >out.txt
But actually all of these are valid ways of printing 3 numbers to a file:
>out.txt echo 1 2 3
echo >out.txt 1 2 3
echo 1 >out.txt 2 3
echo 1 2 >out.txt 3
echo 1 2 3 >out.txt
This is spelled out in the POSIX grammar, but I misread the recursive rules. Fixed now!
Another lightbulb: this is where Python's "print to file" syntax comes from:
print >>sys.stderr, '1 2 3'
I hacked on debootstrap a few years ago when trying to build a package manager
and bootstrap it with Debian tools. What I remember is that it's quite slow
for what it does — for example, parsing all the Debian package metadata
in shell/Perl seems to take forever, and is possibly done in an algorithmically
inefficient way. The code is not very nice along multiple dimensions, even
for sh
code.
At some point, I want to build some profiling tools and hooks into my shell, which will help with monsters like this. But parsing everything is the first step.
I'm actually working on parsing git
now! I knew that some of git was written
in shell, but I didn't realize how much. I'm parsing 125K lines right now, and
that's just a subset of an old copy of the git source tree! There are 10 or so
errors to fix, and then I'll try parsing the entire tree.
I'll have to write the post on lexical states a little later. In the last few days, I've gotten into a good rhythm fixing bugs in the parser, which is more important. The error messages with the column number are really helping, e.g.
Line 33 of '/home/andy/git/other/chef-bcpc/zap-ceph-disks.sh' if ! echo "$disk" | egrep -q "${mounted_disk_regex:0:-1}"; then ^ Unexpected token after slice: <AS_NUM_LITERAL 1>
(This error is because I'm not correctly implementing unary minus in the arithmetic language.)