Why Sponsor Oils? | blog | oilshell.org
If you haven't used Google's protocol buffer serialization technology, this analogy may be helpful:
JavaScript Data Model : JSON :: C Data Model : Protocol Buffers
Just as JSON is a language-independent serialization format extracted from JavaScript's data model (objects, heterogeneous arrays, strings, numbers, booleans), protocol buffers are a mostly language-independent serialization format extracted from C's data model:
double
and float
A similar analogy explains Zephyr ASDL, which I explained from a few other angles in the last post:
C data model : Protocol Buffers :: ML data model : ASDL
ML is the language that introduced algebraic data types or ADTs. ADTs are a characteristic feature of strongly-typed functional languages like Standard ML, OCaml, and Haskell.
ASDL, like protocol buffers, is a domain-specific language that describes a language-independent serialization format for a particular data model -- in this case, the ML data model. It has the following constructs:
Oil will use a custom serialization format I developed, but Python doesn't serialize the data structures it represents with ASDL. Instead, it uses ASDL to share the AST between languages, bridging the parser written in C and the AST module in Python.
Taking into that account, this analogy is also valid:
ASDL : Python :: WebIDL : Web Browser
WebIDL is an interface definition language that bridges C++ and JavaScript in the browser. It's similar to Microsoft's COM, but it's part of a single application rather than an OS-wide construct.
Yesterday, I committed the first pass of oil's ASDL implementation. The schema parser is taken from Python, but these three features are new:
oheap
format, which I'll
describe later.Fortunately, not much code is required to implement these features:
~/git/oil$ asdl/run.sh count 417 asdl/asdl.py 249 asdl/py_meta.py 462 asdl/gen_cpp.py 268 asdl/encode.py 1396 total
The py_meta.py
file uses metaprogramming all over: Python
metaclasses, but also things like dynamic kwargs
and setattr()
.
I believe that type checking oil with mypy is now hopeless. It was thwarted by very simple metaprogramming, and this addition won't help. However, I believe that ASDL is more valuable than mypy for ensuring the structural integrity of the program.
Another thing to ponder: you could say this means I value Lisp over ML, though paradoxically the purpose of the metaprogramming is to use ML's data model in C++ and Python.
I haven't used ASDL in oil yet -- that's the next step. Since I'm obsessed with the line count, let me snapshot the tree now:
$ ./count.sh parser Lexer/Parser 77 osh/parse_lib.py 196 osh/arith_parse.py 291 osh/bool_parse.py 334 osh/lex.py 1144 osh/word_parse.py 1455 osh/cmd_parse.py 3497 total AST and IDs 80 core/tokens.py 99 core/expr_node.py 441 core/id_kind.py 491 core/cmd_node.py 777 core/word_node.py 1888 total Common Algorithms 228 core/lexer.py 338 core/tdop.py 566 total
Using ASDL will affect the middle section the most, but I'm not sure if it will
get bigger or smaller. On the one hand, ASDL provides impressive code
compression. I mentioned in the last post that 123
lines of
ASDL turns into ~8100
lines of C code in Python. (However, the oheap
format needs just 907 lines of C++ generated from 107 lines of ASDL, an order
of magnitude less code. More on that later.)
On the other hand, the Word
and WordPart
classes in word_node.py
have
nontrivial methods, which I need to attach to the classes generated from
osh.asdl
. Also, the tree will be more heterogeneous, because I'm
representing osh
very faithfully and then "lowering" it into what I'm calling
ovm
in my head. ovm
is more homogeneous.
But whether it gets bigger or smaller, the new AST representation brings us
closer to the top priorities. It forms the
backbone of both the interpreter and the tools to convert osh
/ bash to oil
.
This conversion is, of course, the main reason I expect anyone to actually use
oil
!