Oil 0.8.pre5 - Progress in C++

As of this release, we run spec tests against the oil-native binary! In other words, we're measuring how well the semi-automatic translation to C++ works.
- Here are the results. The Python version of OSH passes 1560 tests (+), while the C++ version passes 420 tests. This is significant progress, but there's more to do, which I discuss below.
Koichi Murase made over a dozen fixes to OSH, motivated by running ble.sh (full changelog).
I made a few fixes to run the ShellSpec project. Notably, shopt -s extglob is now respected.
Internal: we have proper C++ unit tests and run them on our continuous build. I started using the greatest.h test framework, and it's simple and effective (Zulip thread).

I'd still like more bug reports! See How To Test OSH.

(+) Test harness bug that will be fixed: 1539 should be 1560.

Closed Issues

#758	Incorrect fnmatch due to extended glob syntax
#754	Implement test -u and test -g
#753	${var+foo} shouldn't cause error when 'set -o nounset'
#727	1 ? (a=42) : b shouldn't require parentheses

Semi-Automatic Translation to C++

Two Analogies: Go Compiler and TeX

What's all this about C++? Here are two analogies to help explain what's going on.

GopherCon 2014: Go from C to Go by Russ Cox (YouTube, 31 minutes). It's time for the Go compilers to be written in Go, not in C. I'll talk about the unusual process the Go team has adopted to make that happen: mechanical conversion of the existing C compilers into idiomatic Go code. (c2go is the one-off tool that helped with translation, analogous to mycpp.)

The flavor of the work is similar to what I'm doing with Oil, but there's a key difference: Oil's source will remain in statically typed Python and DSLs like Zephyr ASDL for the forseeable future. We won't be writing C++ by hand.

Static types play an important role in both translations.
How to compile the source code of TeX. Knuth wrote TeX in a dialect of Pascal, but it's not compiled with a Pascal compiler. Instead, it's translated to C and compiled with a C compiler.

The common thread is that we want to preserve the correctness of an existing codebase. Oil runs thousands of lines of existing bash scripts, including some of the biggest shell programs in the world.

Rewriting by hand would introduce a lot of bugs, so instead we write a custom translator and apply it to the codebase. In Oil's case, there are more code generators to remove dynamic typing and reflection, discussed below.

Recap

In addition to the new spec test metrics, these line counts give a feel for recent progress:

The 0.7.pre9 release in December.
- osh_parse.cc has 9,867 lines of code (raw data). I showed that the OSH parser can be gradually refactored and translated to C++. Notably, the result is as fast as hand-written C code.
The 0.8.pre2 release in March.
- osh_eval.cc has 16,491 lines of code. In addition to the parser, we translate the word and arithmetic evaluators.
This release, 0.8.pre5.
- osh_eval.cc has 20,875 lines of code. We translate the command evaluator, including assignments. So the resulting C++ interpreter can run code like readonly x=y; echo $x. Details below.

For comparison, the slow OSH interpreter consists of about 30K lines of Python code. This doesn't include the Oil language, which I haven't started translating.

The translation isn't going as quickly as I'd like it to, but it's working, and I'm solving interesting technical problems along the way.

As far as I can tell, this unusual process is the shortest path to a fast shell. (As mentioned in January, I encourage parallel efforts. Feel free to ask me about this.)

Details

I keep a log of the translation process on Zulip.

Static typing of flag parsing was a big deal (Zulip thread). A common theme of translation is turning Python reflection into textual code generation, and this was another instance of it.
- Assignment builtins like declare -g foo=bar now work, so we have a path to translate more shell builtins to C++.
Zephyr ASDL is turning into half of a programming language (Zulip thread). Specifically, it's a language for describing typed data, which Python is missing. It now supports dicts/maps with the syntax map[string, int].
The interpreter is still "pure", which is why only 420 tests pass. The nascent osh_eval.cc doesn't even run ls, because it's external process! But it understands the hairy details of word evaluation ${}, arithmetic evaluation $(( )), brace expansion {a,b}, and more.

More background: the March recap had a similar section with Zulip threads: mycpp: The Good, the Bad, and the Ugly.

TODO on Translation

Even though about two-thirds of OSH translates to C++ and compiles, and much of it runs correctly, there's still a lot of work left.

Oil is simply a big project: recall that bash consists of over 140K lines of code. I estimate that OSH implements 80% of bash, with significant fixes. And Oil is a new language with many features on top.

DSLs and Code Generation

Oil's source code will remain in high-level languages for the forseeable future, so we need to enhance the code generators to produce correct and fast C++.

mycpp
- The OSH interpreter uses Python's try / finally for scoped destruction, but C++ doesn't have finally. We should probably use Python's context managers, and have mycpp translate such blocks into constructors and destructors.
Zephyr ASDL
- The translation process deals with exceptions in a messy way, using something approximating #ifdef. Exceptions are more like structs than classes, so they could be naturally expressed in ASDL.
The pgen2 parser generator
- The syntax of the Oil language is expressed with pgen2, and we don't have a C++ code generator for it yet. After discussion with Jason Miller, I think we should borrow the original code generator and runtime from CPython rather than try to translate the slow Python implementation.

Wrapping Shell Dependencies

In the January blog roadmap, I mentioned that there are two technical problems with translation.

One of them was wrapping native C code, which I no longer see as a risk. It's just work. The shell has three main dependencies:

libc. I've wrapped pure functions like fnmatch() in C++, and this is straightforward.
The Unix kernel. Wrapping functions like execve() is similar to wrapping libc, but errno handling is an issue I want to revisit. (These Unix comics are relevant.)
GNU readline for interactive features. To be honest, I'd rather punt interactive features to Oil code, analogous to ble.sh. But Oil should have basic readline support.

Open Problems

The interpreter's memory management is probably the biggest open issue. I have ideas, but I haven't tested them with an implementation.
The autocompletion code makes good use of Python's yield, which I can't (or don't want to) use in C++. I might rewrite it with fork() and write() to a pipe.
- On a related note, Doug McIlroy, the co-inventor of pipes, thought of them as a mechanism for allowing coroutines (like Python's yield). A few weeks ago, I played with the shell and C code in his 2014 explanation of the coroutine prime number sieve (PDF).

Plan for 2020

As mentioned in January, the bare minimum for "success" is when OSH can replace bash for my own use.

After reviewing all this work, I still feel like OSH can be "finished" in 2020. I won't be extremely surprised if isn't, but it seems reasonable.

On the other hand, it seems clear that the Oil language will remain a prototype for the remainder of 2020. I haven't gotten much feedback on it, probably because there isn't much documentation.

This is disappointing, but I don't have a solution to this problem.

In short, the project's focus has necessarily narrowed. The only two goals on my radar are:

The OSH language should be translated to C++, tested, and optimized.
The Oil language should be divorced from the Python runtime and similarly translated. This will almost certainly bleed into 2021.

I should write a longer blog post about this, but almost everything else is cut. Oil will be more like a library than a shell. (As mentioned, I'll need basic GNU readline support for my own use.)

The docs are another sore point. I've mostly been writing them "on demand" (whenever anyone asks). It seems like that pattern will continue, given all the other work that needs to be done.

What's Next?

Continue translating Oil to C++, guided by metrics.
- Increase the number of spec tests passing from 430, shown in spec.wwz/cpp/osh-summary.html.
- Increase the number of lines of code translating and compiling from 20,875.
Fix bugs reported by users. Bug reports really help! Again, see How to Test OSH.
Improve the OSH interpreter, especially with regard to errexit (issue 709). I'd also like to resume work on Running ble.sh With Oil.

Feel free to ask questions in the comments or on Zulip!

Appendix: Selected Metrics

Let's compare this release with the previous one, version 0.8.pre4.

Native Code Metrics

We have nearly 70K lines of C++ code, including over 20K translated by mycpp.

oil-cpp for 0.8.pre4: 64,459 lines, 17,158 in osh_eval.cc
oil-cpp for 0.8.pre5: 69,840 lines, 20,875 in osh_eval.cc

The size of the osh_eval.opt.stripped executable differs between GCC and Clang, an I don't yet know why. In any case, the increase is consistent with translating and compiling more lines of code.

ovm-build for 0.8.pre4: 676,144 bytes under GCC. 803,064 Clang.
ovm-build for 0.8.pre5: 759,072 bytes under GCC. 894,128 Clang.

Test Results

OSH spec tests:

OSH spec tests for 0.8.pre4: 1719 tests, 1529 passing, 82 failing
OSH spec tests for 0.8.pre5: 1762 tests, 1560 passing, 84 failing

There was no work on the Oil language! I'm a bit concerned by that, which is one reason for the scope reduction mentioned above.

Oil spec tests for 0.8.pre4: 253 tests, 231 passing, 22 failing
Oil spec tests for 0.8.pre5: No change.

Line Counts

We have ~300 new significant lines of code in OSH:

cloc for 0.8.pre4: 16,281 lines of Python and C, 299 lines of ASDL
cloc for 0.8.pre5: 16,526 lines of Python and C, 312 lines of ASDL

And ~500 new physical lines of code:

src for 0.8.pre4: 30,193 lines of Python
src for 0.8.pre5: 30,701 lines of Python

Benchmarks

The parsing benchmark didn't change much:

Nor did the runtime benchmark: