Why Sponsor Oils? | blog | oilshell.org
As promised, I published the oil repo. The README tells you how to run it, run tests, and get an overview of the code.
The count.sh
script shows various line counts:
$ ./count.sh all BUILD/TEST AUTOMATION ... 305 spec.sh 394 wild.sh 1053 total SHELL TEST FRAMEWORK 604 sh_spec.py SHELL SPEC TESTS 260 tests/var-sub.test.sh 285 tests/array.test.sh 3509 total OIL UNIT TESTS ... 451 osh/word_parse_test.py 1135 osh/cmd_parse_test.py 3058 total OIL ... 1148 osh/word_parse.py 1505 osh/cmd_parse.py 11497 total
So there are ~11,500 lines of code, ~7000 lines of tests, and ~1000 lines of developer scripts.
The tests contain a lot of the value — I expect them to last longer than the code, which will be ported to C++, hopefully with a fair amount of bespoke code generation.
sh_spec.py
is a test framework that runs shell snippets against multiple
shells and makes assertions on stdout
, stderr
, and the exit code.
Here is another way to view the code:
$ ./count.sh parser 78 osh/parse_lib.py 197 osh/arith_parse.py 319 osh/bool_parse.py 410 osh/lex.py 1148 osh/word_parse.py 1505 osh/cmd_parse.py 3657 total
This first chunk of ~3600 lines is the algorithm for parsing the osh language. As I wrote two days ago, it's three recursive descent parsers and a Pratt parser.
80 core/bool_node.py 101 core/arith_node.py 472 core/tokens.py 557 core/cmd_node.py 997 core/word_node.py 2207 total
This chunk of ~2200 lines is the algorithm is the AST representation.
word_node.py
is big because it contains "smart" polymorphic methods, not just
"dumb" AST nodes.
229 core/lexer.py 313 core/tdop.py 542 total
And these two files are the lexing and parsing infrastructure. I wrote about
tdop.py
in the Pratt parsing post, but I need to write more about the
lexer.
In particular, a lexical state is now called a lexer mode, because
there's another lexer hint mechanism that's also stateful. osh
ended up
requiring a whopping thirteen lexer modes (up from eight a month ago).
And I need to explain what the little-known tool re2c does -- in some sense it's the foundation of the parser and what enabled me to write it.
Although my parser more compact than the bash
parser (details
here), ~5800 lines is still big. I hope to express the oil
language in a more compact way, but that remains to be figured out.
$ ./count.sh runtime 73 core/arith_eval.py 122 core/value.py 230 core/bool_eval.py 370 core/builtin.py 545 core/process.py 827 core/cmd_exec.py 851 core/word_eval.py 3018 total
Parallel to the four parsers are four evaluators: arithmetic, boolean, command, and word. They interpret the AST in the obvious way, taking care to give good runtime error messages.
They make use of a runtime that handles processes, file descriptors, builtin commands, variables, and data types.
The evaluators and runtime are less complete than the parser, but they can execute some real scripts. I hope that open source contributions will help fill out the evaluators and runtime. It seems like parsing is about 60% of the work of a shell, and execution is 40%.
The component missing from these counts is completion.py
, which is the
beginning of a pretty decent completion engine. The completion engine is
interesting because it makes use of the parsers and evaluators as
libraries:
TAB
, without
disrupting the interactive parser currently in progress (i.e. think about
typing echo one; f() { echo <TAB>
.TAB
.Overall, the architecture is an AST interpreter, where the AST is heterogeneous. That is, it has four interleaved sublanguages. Although every shell I've looked at is an AST interpreter written in C, this is only true for the command sublanguage. The other three languages are implemented in an ad hoc manner, interleaving parsing and execution.
Now that the code is public, tomorrow I will write about project priorities for the next few months.