Why Sponsor Oils? | blog | oilshell.org
Prior to ASDL, oil
used hand-written Python classes to represent the
shell AST, and ad hoc print()
statements to show their debug
representation.
Now our ASDL implementation can print an instance of any type automatically. For example, compare:
The new format is more readable, consistent, and complete. We can also print ASTs in plain text or ANSI-colored text.
This program:
echo "hello $name"
is represented with this AST:
(SimpleCommand words: [ (CompoundWord parts:[(LiteralPart token:(token id:Lit_Chars val:echo span_id:0))]) (CompoundWord parts: [ (DoubleQuotedPart parts: [ (LiteralPart token:(token id:Lit_Chars val:"hello " span_id:3)) (SimpleVarSub token:(token id:VSub_Name val:"$name" span_id:4)) ] ) ] ) ] )
That is, we have a SimpleCommand
node with two children:
CompoundWord
with a LiteralPart
, which has a token echo
CompoundWord
with a DoubleQuotedPart
, which has
LiteralPart
with a token "hello "
SimpleVarSub
with a token $name
A span is a subsequence of characters in a line. They're numbered with a
span_id
from 0
to n-1
, and concatenating the spans reproduces the source
file exactly. Spans are used for error messages and for automatic conversion
of osh
source to oil
source.
This representation is complete but can be unwieldy, so ASDL has an application-specific hook to abbreviate nodes.
Except for these locations, the abbreviated format has the same information, and is more readable:
(C {(echo)} {(DQ ("hello ") ($ VSub_Name "$name"))})
A partial list of abbreviations:
SimpleCommand
→ (C ...)
CompoundWord
→> { ... }
;DoubleQuotedPart
→ (DQ ...)
LiteralPart
→ (echo)
or (Lit_Other "=")
I don't want to write print statements for every AST node type I define — I want "generic" pretty printing, which works for any type.
As shown above, oil's ASDL implementation now supports this. Notably, it requires treating types as objects. In other words, it requires reflection, a kind of metaprogramming.
The end of the last post outlined some metaprogramming topics. I don't have a big conclusion now, but here are two related thoughts:
When Polymorphism Fails: Steve Yegge mentions OCaml's lack of "polymorphic print". I encountered this issue in oil. There isn't a straightforward way to "generically" import ASDL trees into OCaml. You have to write a code generator or use metaprogramming like Camlp4.
I think there is room for a dynamic language with algebraic data types, like Pyret.
printf()
can print any type, but it's not type-safe. Using %s
instead of
%d
can crash your program. This is an instance of the type checking vs.
metaprogramming problem.
So, in a statically-typed language, you have to "jump out of" the language into a meta-language to implement something as basic as printing.
printf()
and scanf()
are both little languages on top of C, and in fact
printf is Turing complete! This is a mistake. I started a
Language Design and Theory of Computation page which surveys
similar issues. A lot of the hard work was done by Accidentally Turing
Complete.
I'm trying to stay on the track I laid out in the last post, so the next post should be an update on the source code size.