Why Sponsor Oils? | blog | oilshell.org
Today's post is a status update on OVM, the virtual machine that OSH and Oil will run on.
Earlier this month, I listed six problems with using CPython to implement shell. The OSH interpreter is currently ~12K lines of Python.
In Cobbling Together a Python Interpreter, I described a potential solution to these problems. Inspired by tinypy, I would assemble a Python interpreter to run OSH and Oil. I would reuse 8K - 9K lines of Python for the front end, and write an estimated 3K - 5K lines of C++ from scratch for the runtime.
Yesterday, I described a successful integration of the front end components.
Writing the VM is achievable, but it's not clear how long it will take. The biggest risk is probably that I've never written a garbage collector. Garbage collectors are hard to debug, and I've observed that each one is a unique research project.
This post describes an alternative plan:
(Aside: A few people on Lobsters doubted the original OPy plan, and I concede that they had a point. But they didn't suggest a better solution. Neither rewriting all the code in C/C++ nor shipping OSH as a Python program are good solutions. Leave a comment if you're unclear about this.)
In oil/cpython-slice on Github, there are shell scripts that build a stripped-down CPython 2.7 VM. It's messy, but it proves the concept:
I removed the CPython tokenizer, parser, and bytecode compiler. This computation can be done offline with the bootstrapped OPy Front End.
The Lua VM can be shipped without its parser, and this is the same idea.
I statically linked the fcntl module, which we need to implement
redirections, and removed all the dynamic modules from the build process. I
disabled dynamic loading of .so
files, because a shell doesn't require
that.
CPython's build system can be configured to statically or dynamically link
each standard library module via the Modules/Setup
file.
Initially, I was modifying Makefile.pre.in
(the autotools input).
After I gained familiarity with the build system, I wrote a shell script
(slice.sh
) to compile and link CPython/OVM with a single cc
invocation.
A debug build only takes 5 seconds.
I plan to further strip down CPython, guided by these analysis tools:
Here I address the six problems in order of most-solved to least-solved:
Python 3's handling of unicode is awkward for a shell.
For both OSH and Oil, I plan to Go's UTF-8 centric strategy, implemented in Python 2.
That is, only a handful of operations like ${#s}
and ${s:0:8}
need to
know about bytes vs. characters. Otherwise, the shell can pass byte strings
from input to output unmolested. For example, a common input is the file
system, and a common output is the argv
array passed to exec()
.
If you have a use case where this doesn't work, or you require an encoding other than UTF-8, leave a comment.
The Python interpreter does things with signals that we don't want.
I'm taking control of the Python VM, so this can be fixed by with different
calls to the embedding API or patching the source. I noticed the
Py_InitializeEx() function with
its initsigs
parameter while doing this experiment.
Shells are required on machines where Python isn't, like Android phones and other embedded devices.
Bundling a stripped-down interpreter with OSH solves this problem. It's
important to think of it as an implementation detail. From the build
perspective, it should be indistinguishable from any other C program:
download a tarball and run ./configure && make
.
The Python interpreter starts slowly.
Python's baroque import mechanism is the main reason for this. Everything
related to site.py
can be stripped out of OVM, and sys.path
will have a
single entry. Also, we no longer have to look for .py
or .so
files.
Everything will be .pyc
files or statically linked.
I'm really excited by this. Not just because Oil has no need for it, but because startup time has been bugging me for over a decade while working with Python!
A shell should be simpler and smaller than Python. Also, OSH should also be smaller than bash, which is ~150K lines.
Right now my slice of CPython has ~135K lines of C. However, the ~12K lines of Python in OSH, combined with their Python standard library dependencies, unfortunately makes it bigger than bash.
This is where the quick solution loses compared to writing my own VM. But a key point is that the size can be reduced after the OSH 0.1 release, and after writing Oil. There are still many opportunities to strip out code.
As long as the startup time is low, there shouldn't be any noticeable downside of a big binary.
The Oil interpreter should be a library.
I won't be exposing the Python-C API to users. Eventually I'll be able to design my own Lua-like API, but that remains in the distant future.
Rather than writing my own VM, I'm forking the CPython interpreter, which is the minimum amount of work to address the six problems.
Essentially, my strategy is to ship the prototype. I mentioned in the first post that I wrote ~3K lines of C++ to start OSH, but realized I would never finish at that rate. The Python code turned into the whole project.
I'm not sad about that, because I need leverage to complete not only OSH, but Oil with its Awk and Make functionality.
I believe there are no remaining obstacles to an OSH 0.1 release. There's just work! I don't know when it will be done, but I'll lay out the criteria for a release in a future post.
Potential future topics:
What to remove next. The code is still too big. The binary is ~1.3 MB on x86-64, built from ~135 K lines of code. I want it to be smaller than bash on both counts.
We can remove Python's GIL because shell is single-threaded. It uses processes rather than threads for concurrency.
We can remove a lot of platform-specific code, like Windows support.
Simplifying the build system. Shells rely on the oldest and most compatible parts of Unix, so building them shouldn't require autotools.
Advantages of using CPython. For example, the dictionary object is highly optimized, and we can reuse it for Oil's dictionaries.
Having a Python interpreter also opens up the possibility of using some third-party libraries like python-prompt-toolkit for the interactive shell. I'll have to test these libraries for size, performance, and robustness.
Tools for working with native code. I was excited about writing my own
VM, but I'm actually more excited to learn more about tools to optimize for
size and space. For example: code coverage, speed profilers like perf
,
heap profilers, Bloaty McBloatface, OS tracing, -m32
builds, the Clang AST
(or LST), and more.