Why Sponsor Oils? | blog | oilshell.org
In the last 2 days, I landed a few weeks' worth of big changes on the master branch. After writing my ASDL implementation, I replaced the backbone of the interpreter with dynamically-generated ASDL classes.
Recall that ASDL is a way of using ML's data model of algebraic data types inside another language, roughly analogous to how JSON lets you use JavaScript's data model in another language.
I'm happy with ASDL, so I plan to write a few posts about it. In this post, I'll review the progress on oil so far, and then show an example ASDL schema and data structure.
In subsequent posts, I'll go into detail on oil's ASDL implementation, describe how it concretely moves the project forward, and share some more abstract thoughts.
Shortly after I released the code in November, I listed two top priorities, gave a detailed roadmap, and mentioned a third use case for the AST.
Looking back, we're pretty much on track. Replacing the backbone of the interpreter with ASDL took awhile, requiring changes to essentially every source file, but it was necessary for all three priorities:
Why do the same work in both Python and C++? The first reason is that discovering the semantics of shell is the hard part, and we want to do that in an agile language. Once that's done, writing the code is easy.
The second reason is that the C++ executor will operate on a lower-level representation of code. I'll explain this in a future post.
At 129 lines, osh.asdl compactly describes the osh language (which is nearly identical to bash). An excerpt:
token = (id id, string val, line_span? loc)
word =
TokenWord(token token)
| CompoundWord(word_part* parts)
This ASDL schema syntax should be readable to programmers who know Haskell or ML. For others, it only involves a few concepts and can be read like this:
token
is a product type, with two required fields id
and val
, and
an optional field loc
.word
is a sum type with two alternatives:word_part
(which is
itself a sum type with nine alternatives; not shown.)For those with C background, it's helpful to remember that a product type can be represented by a struct, and a sum type can be represented by a tagged union. However, structs and unions fall short of algebraic data types because of the static type system.
Consider this statement:
ls >> ~/git/$repo/listing.txt
It consists of three words:
CompoundWord
representing ls
TokenWord
with the token (Id.Redir_DGreat, ">>")
CompoundWord
with 4 parts:TildeSubPart
: to substitute $HOME
LiteralPart("/git/")
VarSubPart(repo)
: to substitute the repo
variableLiteralPart("/listing.txt")
These three words are further parsed into a command
node, which our ASDL
implementation pretty-prints like this:
(SimpleCommand words: [ (CompoundWord parts: [ (LiteralPart token: (token id:Lit_Chars val:ls loc:(line_span pool_index:0 col:0 length:2)) ) ] ) ] redirects: [ (Redirect op_id: Redir_DGreat arg_word: (CompoundWord parts: [ (TildeSubPart prefix:"") (LiteralPart token:(token id:Lit_Chars val:/git/)) (VarSubPart name:repo) (LiteralPart token: (token id: Lit_Chars val: /listing.txt loc: (line_span pool_index:0 col:17 length:12) ) ) ] ) fd: 1 ) ] )
This may look like a Lisp S-expression, but two features give it more structure:
[ ... ]
aboveIn addition to this textual representation, which is useful for debugging, there's also a binary representation. I will describe that in tomorrow's post.