Project Roadmap #4

2017-03-01

In the last post, I reviewed November's Roadmap #3. Even though I'm the only one working on Oil, I'm happy with the progress so far.

But I think the project has reached a point where contributions will accelerate it. To orient contributors, this post describes what I hope to happen in the next month or two.

1. Testing Enhancements

Software projects move faster when they have automated tests. Tests should help contributors write code for Oil, and they'll help me review code.

As mentioned on the Github page, there are three kinds of tests:

Unit tests written in Python.
Spec tests which run against some subset of OSH, bash, dash, mksh, and zsh. The idea is to figure out the specification for OSH by testing what happens in practice. (In practice all shells are highly POSIX-compliant, but we need more detail than that.)
"Wild" tests which test the parser against source code found in the wild. These tests are more of a guideline, because we only test that there are no parse errors, as opposed to making assertions on the LST.

I've published the spec test results as HTML. You can see the results of running the *.test.sh files, details about failures, and annotated source code.

The next task will be to publish the unit tests and wild tests a similar way.

The test coverage is fairly high, and I expect to keep it that way. We want the ability to make aggressive code changes and rely on tests to catch breakages.

2. Vertical Slice of the Shell Runtime in C++

It's conceivable that the current shell runtime in Python will work for many applications. The speed of most shell scripts should depend on the speed of the external programs they use, not the speed of the shell itself.

But I want to write the runtime in C++ for a few reasons:

Modern shells are weaker than Python at string manipulation, frequently requiring external programs like sed and tr. I want Oil to be at least as good as Python in this respect, and that will require a fast interpreter.
I want Oil to subsume the functionality of Awk. Awk has floating point numbers and hash tables, which are best implemented in a native language.
A shell shouldn't depend on Python or any other interpreter. For example, Python isn't installed on Android, but it has a shell (mksh).

So I'll be working on a vertical slice of the C++ software architecture. The first step is to run something like this:

$ test -d / && echo 'hello world'
hello world

This will require the following:

Write a slice of the OSH-to-OVM compiler, probably in Python. OVM is a shared runtime for OSH and Oil: both osh.asdl and oil.asdl will compile to a simpler language specified in ovm.asdl.
Serialize OVM trees in OHeap format. This code is already written as part of ASDL.
Generate C++ classes for for the OVM AST nodes. They will use enum values in core/id_kind.py, mentioned in The Backbone of the Interpreter. Both of these code generators are already written.
Write a tree-walking interpreter. I'll adapt the C++ code mentioned in the very first blog post. I had the skeleton of a shell, but abandoned it because iterating in C++ is onerous.

So three pieces of relevant C++ code already exist, and I hope I can glue them together quickly. The main design issue will be the OVM code representation.

3. Further Work

Oil Parser. I wrote a parser for OSH, and began a translator from OSH to Oil. But we still don't have a parser for Oil! It will likely follow the structure of the OSH parser, but with fewer lexer modes (formerly called lexical state).
Bootstrapping. After writing the runtime in C++, we'll still have an OSH parser written in Python. One way to break the dependency is to compile Python to OVM.

I'll save the details of these tasks for a future post. I'm not sure in what order I'll tackle them, and there is more than one possible approach for each one.

There's also the ongoing work and other unfinished tasks mentioned in the last post. Also, the OSH to Oil translator isn't done, although it's been proven.

There's a lot of work to do, and I think some of it can be parallelized. I hope that the automated test enhancements I've described here makes it pleasant to work on the project. Please leave a comment if you want to work on it.

Although I think the HTML page I linked above is fairly easy to understand, I'll explain it in detail in an upcoming post. I also plan to add docs on contributing and testing in the Git repository.