Why Sponsor Oils? | blog | oilshell.org

Oil 0.14.2 - Interactive Shell, and Conceding to autoconf

2023-03-16

This is the latest version of Oil, a Unix shell that's our upgrade path from bash:

Oil version 0.14.2 - Source tarballs and documentation.

We're moving toward the fast C++ shell (formerly oil-native), so there are two tarballs:

The C++ version doesn't exactly match Python, but it's getting close. We're also starting to use the "Oils for Unix" name, which I'll explain.

The wiki has tips on How To Test OSH. If you're new to the project, see Why Create a New Shell? and posts tagged #FAQ.

Table of Contents
Messages to Take Away
Review
7 Releases Since October
7 Parts of the Project
Interactive Shell in C++
Contributor Credits
Shell Arithmetic: Conceding to Reality
Static vs. Dynamic Parsing
A Hidden eval / Arbitrary Shell Execution
Code, Data, and Security
Can OSH Be Done in 2023?
Is It Hard to Contribute?
Open Questions / Risks
Summary
Please Donate
Appendix: Metrics for the 0.14.2 Release
Spec Tests
Benchmarks
Code Size

Messages to Take Away

Readers have been asking about Oil, so let's start with the important info.

If you appreciate this work, please sponsor us:

We're using the donations to "on board" new contributors, before they're added to our NLnet grant.

Review

This release has two highlights: the interactive shell in C++, and OSH changes for autoconf.

But let's review the project first, since I've only written 2 posts in the last 6 months. This is mainly because I've been working with contributors under the grant. I'm talking to them, rather than "talking" on the blog!

7 Releases Since October

So despite few release announcements, there have been steady releases this whole time, with hundreds of changes.

It's hard to remember everything that happened. The short story is that we've been working to fulfill the promise of the OSH part of the project, described in 2020's Four Features That Justify a Unix Shell. To recap, those are:

  1. Reliable error handling
  2. Safe processing of user-supplied data, like filenames
  3. Eliminate "quoting hell"
  4. Static Parsing for better error messages and tools

7 Parts of the Project

In 2021, I explained in several posts how the scope has always been a problem, and it's been changing. There are 7 parts to the project, each large:

  1. The compatible OSH language
  2. The Oil language, with Python- and JavaScript-like data structures, and Ruby-like blocks.
  1. The interactive shell, which I cut from the project in 2021, for lack of time.
  2. Translation of the "executable spec" from Python to C++.
  3. Documentation has fallen behind.
  4. I've also let this blog fall behind.
  5. Our own dev tools "lifted" into applications.

Interactive Shell in C++

Let's move on to release highlights. The thing that most users will care about is that the interactive shell is working in C++! I'm using it on my machine now, running:

This is due almost entirely to Melvin, which is good news for people who have been wondering about Oil!

In addition to crediting his great work in that reply, I clear up a couple misconceptions. One is that OSH is in fact a POSIX- and bash-compatible shell. The commenter was confused about OSH vs, Oil, which isn't uncommon.

So I plan to slightly rename the "Oil shell" project to "Oils for Unix", and the Oil language to YSH. OSH remains the same. I'll officially announce this in the next post, and elaborate on the motivation.


For more background on the interactive shell, see the the FAQ, in particular:

It would have been a shame to drop this part of the project, so I'm very glad that Melvin revived it. A great thing about shell is that the user interface and the language are intertwined, and support each other!

(Related: Unix Shell: Philosophy, Design, and FAQs).

To make this more concrete, see the informative README in the rtx project:

In particular, it links to a good article on ASDF performance.

What I take away is that shells are powerful and universally-used interfaces for managing project dependencies, and the shell language itself should support this. Right now, these tools are slow, and have composition problems due to ordering, and can step on each other. They rely on bash hacks like mutating $PROMPT_COMMAND and messing with your startup files.

Just like Nix, asdf and rtx are pushing the boundaries of what our current shells are capable of.

If you have any concrete suggestions for OSH — or, even better, want to work on them — please get in touch.

Contributor Credits

The next release highlight is hard to explain, so let's take a break and credit more contributors. There have been hundreds of changes in the last few months, and it's easier for me to remember specific people than all the changes.

More people who tried Oil and reported bugs:

The $_ variable contains the last word of the last command. I had never used it before working on Oil, but it's very handy with Ninja:

$ ninja _bin/cxx-dbg/osh && $_ -c 'echo hi'
ninja: no work to do.
hi

Also:

Some notes on performance: We're still allocating too much, which is a well-known peril of writing software like mathematics! I've fixed some low-hanging fruit, and my experience confirms that the two container optimizations will be important.

I also spent a lot time measuring the parser and interpreter with uftrace. Surprisingly, lists/vectors are more common than strings.


The shell arithmetic issue below also reminded me that Koichi Murase, author of ble.sh, originally implemented much of shopt --set unsafe_arith_eval! We're still using that code, but we've relaxed it slightly. Thank you!

I probably omitted some contributions, so please feel free to ping me with yours, and I'll update this section. And let me know if you'd like to be credited in a different way.

Shell Arithmetic: Conceding to Reality

The other highlight in this release is that shell arithmetic is more compatible with POSIX, due to autoconf's usage.

Thanks to Zack Weinberg for testing autoconf with OSH. Also see his great article:

This arithmetic issue goes back to 2019, and is hard to explain. Bear with me, or feel free to skip to the next section.

Static vs. Dynamic Parsing

Long-time readers may recall that I wanted OSH to be "statically parsed" like Python or JavaScript, for usability and speed.

But, as of this release, we allow dynamic parsing in arithmetic. For example:

$ x='1 + 2'      # var that looks like math

$ echo $(( x ))  # shells parse and evaluate strings as code
3                # there's no explicit 'eval'!

POSIX requires this in theory, and autoconf requires it in practice.

I resisted this type of behavior for a long time — not just for usability, but also because OSH ended up being more secure than other shells due to its parsing philosophy.

A Hidden eval / Arbitrary Shell Execution

In particular, in 2019, I rediscovered a vulnerability in shells that have arrays. To be concrete, bash and zsh have arrays, but dash doesn't.

Even dash will evaluate your data as code, as in the example above. However, as long as it's confined to arithmetic, this is merely confusing, not dangerous. (Imagine if print('1 + 2') in Python showed 3, rather than the string 1 + 2.)

In contrast, if you use say bash, an attacker who controls x can execute arbitrary shell commands on your machine:

$ a=(1 2 3)  # shell array

$ x='a[$(echo 42 | tee PWNED)]=5'  # variable with code in it
                                   # looks like an array index
                                   # with a command sub

$ echo $(( x ))  # arbitrary shell execution in bash, zsh, mksh!
                 # not dash

$ cat PWNED  # 'echo 42' can also be 'rm -rf /' !
42

Details at https://github.com/oilshell/blog-code/tree/master/crazy-old-bug. Stephane Chazelas, who discovered ShellShock, and the Fedora security team also warn about this issue.

So OSH disallowed all dynamic parsing unless shopt --set eval_unsafe_arith. But that caused problems for autoconf. I believe ./configure scripts would fall back to the external expr command with "stock" OSH.

We've now relaxed that option so autoconf can run. But it still disallows arbitrary code execution:

osh$ echo $(( x ))
  a[$(echo 42 | tee PWNED)]=5
    ^~
[ var ? at line 7 of [ interactive ] ]:1: fatal: Command subs not allowed here because eval_unsafe_arith is off

Does that mean we're compromising on the design of the Oil language? No, I also added shopt --unset parse_sh_arith, which disallows shell arithmetic and thus dynamic parsing in Oil. So OSH now has dynamic parsing, but Oil still does not.

Instead of shell arithmetic, can use Oil's expressions over typed data, which includes integers.

$ x=$(( 1 + 2 ))  # shell style, invalid in Oil

$ var x = 1 + 2   # Oil style

Code, Data, and Security

You might ask why I'm blogging about this hidden eval, rather than reporting it. Well, I reported it years ago to bash, OpenBSD ksh, and other shells. (OpenBSD was the only one that fixed it at the time. Others may have fixed it since then.)

Some some people already knew about it, and some people had a hard time understanding the report. A common response was:

Well that's how shell is. It allows you to execute shell commands.

— not an exact quote :)

In response, I say that POSIX shell is not like that. Shells like dash don't have the bug, because they don't have arrays. Try it.


There's a huge difference between code and data, both in computer science and in practical network security. A good shell should respect this difference. Again, this is one of Four Features that Justify a New Unix Shell.

When there were 10 Unix machines in the world, it was OK to be loose about code versus data. Even in the 1980's, every file on a Unix machine may have been provided by the manufacturer, or created by your coworkers. You could reasonably treat filenames as trusted data.

But today, you may download hundreds of megabytes of git repos and package manager dependencies, written by thousands of people. So a shell should treat filenames and other external data as untrusted.

Can OSH Be Done in 2023?

I'm now itching to work on the Oil language, but I also want the compatible OSH to be polished and "done".

So here's the call to action: please test Oil 0.14.2, and report bugs. Both the Python and C++ versions are ready to test.

Generally speaking, "batch" shell scripts should run under OSH, but interactive plugins may be more difficult. They are more tightly coupled to a specific shell.


The C++ version still fails 16 spec tests that the Python version passes (out of ~1800), but otherwise it's in pretty good shape.

Now that we have a pure C++ tarball, it would be great for someone to revive the work on running Nix shell scripts.

I expect more "conceding to reality", as with the shell arithmetic issue. But not too much, because we've fixed bugs like this for years. The latest bug reports have been great, and I'd like to see more testing, and get more help.

Is It Hard to Contribute?

I've gotten feedback that it's hard to get started on the code. (Our Contributing wiki page describes how.)

Part of the problem is inherent in our metaprogramming approach. Again, Oil Is Being Implemented "Middle Out".

Another problem is that the codebase was something of an experiment for many years. In particular, the garbage collector was an "unknown unknown". (I didn't know what I didn't know about GC.)

But now that the shell works, the project feels "opened up" again. We are stabilizing and improving the tools. It didn't seem worth it to polish tools that didn't yet produce a working shell.

In particular, mycpp, ASDL, the build system, the test harnesses, and the CI are rapidly improving. I've collected Zulip threads that support this, like:

This long-running thread keeps track of problems:

I may elaborate later, but in the meantime, try building Oil, and ask me questions about the dev process!


I'll also repeat that recent contributions give me confidence that the codebase can have many hands in it, and will last a long time. In particular, Melvin has made large changes across Python and C++ code, wrapped native libraries like GNU readline, and fixed issues and design problems related to Unix signals and job control.

Open Questions / Risks

Last year, the C++ translation and the interactive shell were two big unknowns, and but they no longer are.

Are there any more fundamental issues blocking the project? In the last 2 months, I've been "kicking up dust" all over the repo to figure this out. Here are some of the bigger ones:

Summary

What's next? I've kept a backlog here:

At the very least, I want to publish a post about renaming the project:

I'm not looking forward to the extra work and churn, but I think these names will reduce confusion, and are better in other ways.

Please Donate

Again, we're using the money to bring in new contributors.

On the flip side, if you can get through Contributing, run bin/osh -c 'echo hi', and test OSH, you might be a good person to work on Oil!

Appendix: Metrics for the 0.14.2 Release

We last reviewed metrics in Oil 0.12.7 in October, so let's use that as our baseline.

Spec Tests

The Python reference implementation is improving:

And the C++ translation is catching up:

Again, the majority of this was due to Melvin's work on the interactive shell.


On the other hand, work on the Oil language has stalled:

Benchmarks

The parsing metric had a bug as of release 0.12.7, so let's use 0.12.9 as a baseline.

What's notable is that we turned on the garbage collection in this time! I have more plans to optimize the parser. It's representative of user workloads, and it's also a good stress test for the GC.

The C++ shell got much faster, and it's approaching the speed of bash on this difficult workload:

Code Size

The executable spec remains small! Significant lines:

Code in the oils-for-unix C++ tarball, much of which is generated:

Compiled binary size: