Why Sponsor Oils? | blog | oilshell.org
Three months ago, in Roadmap #5, I wrote that OSH will be a better shell for building Linux distributions. It will run existing code, including bash scripts, but it's stricter and easier to debug.
In the last month, I've made significant progress toward this goal. I fixed dozens of bugs, implemented new features, and simplified the codebase.
OSH can now run thousands of lines of shell scripts that build three distros: Aboriginal Linux, Alpine Linux, and Debian. This post describes what I did, and the technical work that was involved.
I haven't written about Linux distributions in awhile. What happened?
OSH was able to run abuild -h
back in October, but its parsing speed made
debugging sessions unpleasant. On a fast machine, it took more than 1600
milliseconds to parse abuild!
So I pushed two tasks onto the stack, for a total of three:
The two releases since October popped #3 and #2 off the stack:
Now OSH can parse abuild in about 250 milliseconds. That's still too slow, but it's not blocking progress.
I plan to release OSH 0.4 at the end of this month. It will be able to run not just abuild, but also shell scripts from Aboriginal Linux and Debian.
After that, the stack will be empty again. I had to shave some yaks, but I didn't lose sight of the goal!
I didn't understand how Linux distros worked until pretty recently. It's useful to think of them as having (at least) these four components:
apt
; for Red Hat-derived systems (CentOS,
Fedora, etc.), it's yum
.I'm pleased by the diversity of the three distros I worked with because it gives me confidence that OSH is working:
So not only am I testing shell scripts by different authors, I'm also testing OSH for compatibility with scripts written for different shell dialects.
Here is some more background on these projects and detail on what I did:
debootstrap assembles the Debian root file system from .deb
packages.
Roughly speaking, .debs
are tarballs of binaries, scripts, and metadata. I
parsed debootstrap with OSH back in October 2016.
It's ~2600 lines of shell (excerpt). I worked with this script a few years ago, and I remember it looking scary. There were weird incantations that I didn't understand. Now it's easy to read, which I think means I've spent too much time with shell :-)
What Now Works: I used OSH to build an Ubuntu Xenial image, chroot into it, and run commands. The sections below describe the fixes required to make this work.
Alpine Linux started out as a distro for embedded systems like routers, but it's also now used for containers in the cloud. Docker, Inc. sponsors it, and postmarketOS is based on it.
.apk
packages and metadata.
(excerpt).What Now Works:
.apk
packages with abuild running under OSH-musl.abuild verify
to check that the packages looked reasonable.Aboriginal Linux isn't a distro, per se. It's an educational project that looks like a distro. It answers the question: What is the smallest number of packages that will create a Linux system that can rebuild itself?
The project is now defunct. But the code still works, and I still find it interesting, e.g. from a security point-of-view.
It's ~3700 lines of bash (excerpt). It was the first project I parsed with OSH.
What Now Works:
i686
target using OSH. This builds a complete system image
from source code. In contrast, debootstrap assembles an image from
binary packages.In summary, I tested OSH on a diverse set of shell scripts found in the wild, and fixed what was necessary to make them run.
I started this process after the last release, and I honestly didn't know how long it would take. There were more problems than I expected, but I was also able to fix them more quickly than expected.
What features were missing?
Some errors I ran into had obvious causes. For example, OSH would throw
NotImplementedError
when a program used ${s:1:2}
(string slicing). Getting
past this error by implementing slicing was simple.
Other errors required debugging thousands of lines of other people's shell
scripts. So I needed to learn more about bash and debugging.
This tip on making xtrace useful helped me. In bash, you
can set the $PS4
variable so that traces include the filename and line
number.
So I mimicked these debugging features in OSH:
set -x / xtrace
, with $PS4
support.$SHELLOPTS
, so you can inherit xtrace
. Shell scripts
often invoke other shell scripts, and this is bash's way to preserve -x
across invocations.PS4
string: $LINENO
, and my own
$SOURCE_NAME
.Note that bash actually has a debugger called bashdb! Describing the
way it works would be another post. In short, it uses hooks specified with the
trap
builtin, as well as several $BASH_*
variables.
A recurring theme was relaxing OSH's strict behavior in order to accomodate
common shell usage. However, I added the ability to opt in to the strict
behavior, with set -o strict-control-flow
, strict-array
, and
strict-errexit
.
I'll address this topic in another blog post, but feel free to leave comments if you're curious.
POSIX has quirky rules for the $IFS
variable, which determines:
read
builtin splits fields.I rewrote the buggy regex-based IFS-splitting with an explicit state machine. This is an interesting piece of code which I may explain in another blog post. It's in core/legacy.py. It turned a lot of red tests green.
echo -e 'foo\n'
and $'foo\n'
are both ways to write C-escaped strings.
Their relationship is the same as the relationship between [ and
[[ — the former is dynamically parsed, and the
latter is statically parsed.
(For example, dynamic parsing allows this: char=n; echo -e "1\\${char}2"
, but
static parsing doesn't.)
I implemented these with a similar, but not identical, lexers, using the style described in my posts on lexing. I again found that metaprogramming is useful for avoiding code duplication.
This is another feature that touches some computer science. I discovered that semantics that originate with ksh can't be efficiently expressed with POSIX APIs:
fnmatch()
does glob-style string matching, but it doesn't return the
position of the match.regexec()
does return match positions, but it doesn't support
non-greedy matching like Python's regex API does.In theory, Python's API should be able to efficiently express the semantics of
${s%suffix}
vs. ${s%%suffix}
, so OSH used the strategy of translating
globs to Python regexes. For example, the expression ${s%%*suffix}
could
be implemented with the regex .*?(suffix)
.
However, abuild uses character classes in globs, e.g. ${i%%[<>=]*}
, which
aren't straightforward to translate.
So I reimplemented these operators using the conventional, inefficient
algorithm: a linear number of calls to fnmatch()
, one for each position in
the string! (in the worst case)
This makes the overall algorithm quadratic. If fnmatch()
isn't linear,
which it often isn't, then stripping glob prefixes and suffixes
will be even slower than quadratic.
However this issue doesn't appear to arise in practice, as all shells use the slow algorithm. Of course, Oil will provide string manipulation functions that aren't slow in theory. I want the language to be safe to use in adversarial contexts.
Running the distro scripts required several other shell features. In most cases, I had already done the hard part: representing code with the lossless syntax tree. The implementation often "falls out" after choosing a good representation.
${s:1:2}
and ${a[@]:1:2}
.diff <(sort left.txt) <(sort right.txt)
. This
feature is inherently flaky because it doesn't wait()
on the forked
process, and it didn't set $!
until bash 4.4.type
builtin without -t
. abuild unfortunately matches
the output of type
with a regex.test
builtin:
-L
and -h
are aliases to check if a path is a symlink.[ -t 1 ]
to check if stdout is a TTY. There is no color in
abuild without this!-nt
and -ot
to compare timestamps on files.Reimplementing these shell quirks was both fun and depressing. As penance, I've been maintaining a wiki page of Shell WTFs (which is not well-organized).
I could blog every day about one of these and not be done for months. But I remind myself that my goal is to improve shell with the Oil language, not dwell on the past. Legacy behavior is only useful as far as it gives users an upgrade path to Oil.
In addition to implementing features, I also found and fixed bugs in OSH.
As far as I know, a shell must handle file descriptors differently than any other Unix program. It can't open any files in the descriptor range 3-9, because shell scripts may use them directly.
source
'd scripts are now moved out of the way
immediately after open()
, with dup2()
.echo hi 6>&1
, which debootstrap
uses.To debug these issues, I used the /proc/$$/fd/
mechanism mentioned in OSH
Runs Real Shell Programs. It's a nice way of showing
the file descriptor state of a process.
In The Riskiest Part of the Project, I mentioned several difficulties with using CPython to write a Unix shell.
I encountered another problem: Python does its own buffering of file I/O. I believe this is on top of libc's buffering, although I haven't looked into it deeply.
sys.stdout.flush()
is required after type
prints its output; otherwise
$()
may be incorrectly evaluated. Hat tip to timetoplatypus
for
mentioning this with respect to the dirs
builtin.read
builtin can't use Python's f.readline()
. The descriptor that
underlies the sys.stdin
file object changes when you redirect, which
interacts badly with buffering.Instead, I have to read a byte at a time from file descriptor 0
. This seems
inefficient, but I noticed that dash, mksh, and zsh all do the same
thing (in C). For example, try:
$ strace zsh -c 'read x <<< "hello world"'
&&
and ||
. Confusingly, they have equal precedence
in the command language, but the normal unequal precedence in the [[
expression language.FOO=bar myfunc
. Shells differ in behavior here!${x/pat/replace}
when x
is undefined. (This case revealed a bug in
mksh.)cd
-ing away from a directory that's been removed.readonly R; unset R
should return 1
and respect errexit
, not
unconditionally fail. Although I think programming errors are different than
runtime errors, even in dynamically-typed languages, errexit
will be on by
default in Oil. (It would also be nice to make this a statically-detected
error.)I punted on a few things that weren't strictly necessary to build the distros, or which had easy workarounds:
trap
builtin is unimplemented; warnings are printed on stderr
.alias
is also unimplemented. I changed a couple aliases in
alpine-chroot-install to functions. Trivia: bash is the only shell
that doesn't expand aliases by default; it requires shopt -s expand_aliases
.set -h / hashall
is a stub that does nothing. This option is used by
Aboriginal and affects bash's $PATH
cache, which I don't yet understand.Also note that these OSH builds are in a sense "shallow". I changed the
shebang lines of the top-level scripts, which are thousands of lines long, but
they often invoke more shell scripts with a #!/bin/bash
or #!/bin/sh
shebang line.
For example, building any Linux distro will require running dozens of
configure
scripts. Fortunately, OSH can already run those.
As mentioned, the upcoming OSH 0.4 release will include all this work.
After concentrating so much on the code, I now have several writing tasks backed up:
nickpsecurity
brought an interesting paper to my attention, and I followed the citations
and read two more papers. I responded in comments on lobste.rs
and reddit. There is more to say about them!It would also be nice to get oil-dev@
going again. If you're interested in
contributing, e-mail me or leave a comment.