Why Sponsor Oils? | blog | oilshell.org

A Retrospective on the Oils Project

2024-09-13

Every time I've released Oils in the last year, I've said I would write a project retrospective.

Readers have been interested, but I usually want to get back to coding and design after an announcement.

So here's one way of looking at the project. The last few posts were mostly positive, so let's now be critical!

Table of Contents
Background
Summary of 2021 Retrospective
Why is Oils Taking a Long Time?
It's Inherently Hard
We're Still on the "Bell Labs Timeline"
Bash is Big
Oils is Bigger
Technical Reasons
Short Code Takes More Effort to Write Than Long Code
Speed is Hard, and C is Fast
The Shell Runtime is Complex
Social Reasons
Small Pool of Contributors?
Disincentives for Software Interoperability
Oils Was a Personal Research Project
It's Now a Team Project
Caveat: Design Bottlenecks
Help Wanted
The Main Message
What's Next?
Appendix: More Viewpoints

Background

I published an overview of the project a few days ago:

We'll start by summarizing the last major reflection on the project:

Summary of 2021 Retrospective

I mentioned that Oils is a bunch of experiments that kept working. Some of them took two tries, like J8 Notation, but they eventually worked.

On the other hand, here are 2021's wrong turns and regrets:

  1. Translating OSH to YSH didn't work, technically.
  2. The "OPy" bytecode compiler would never be fast enough.
  3. I thought about bootstrapping too much, e.g. a "Tea dialect" to write Oils in

(I confess I'm still thinking about bootstrapping, like exposing a well-defined IR for mycpp to users. This is the "Yaks" experiment, mentioned in the project overview.)

Why is Oils Taking a Long Time?

Taking into account what I wrote in 2021, let's make a fresh assessment. I have nine points to make, which fall in these categories:

  1. The goals are inherently hard
  2. Technical reasons
  3. Social reasons

It's Inherently Hard

We're Still on the "Bell Labs Timeline"

This is a way of saying that Unix and C were created in the early 1970's by Thompson and Ritchie, at Bell Labs, and

So, to the extent that we want to get off the "Bell Labs timeline", we don't know how to do it!


This is why Oils has the unusual structure of both OSH and YSH — two shell languages that share the same runtime. We want to provide a smooth, gradual upgrade path — ideally without language wars.

Some readers have suggested: Why not just work on YSH, and de-emphasize OSH? A couple answers:

  1. We sort of did that, in that our NLnet grants were focused on mycpp and YSH. But contributors were also excited about OSH too!
  2. Many users are interested in only OSH, in addition to users interested in only YSH.

I'll return to the ideas of compatibility and a gradual upgrade path later. Slogans:

Bash is Big

This is the most predictable reason. The first post in 2016 underestimated the size of bash:

Given that my shell is closer to the bash language than the POSIX shell subset ...

We now have 44 K lines of code in OSH, so 10 K lines was not close!

Of course I can be wrong again, but with more experience, I expect there to be a "long tail" of OSH convergence, especially now that YSH is "figured out". This may be 5-10K more lines over a few years. Though it depends on how many people help.

For some color:

We also need more testing and demos to show off what OSH can do. Samuel and Aidan have done great work in this area recently.


All in all, it's not too surprising that I underestimated bash. Projects have to start with optimism!

Note that the project has "oscillated" between OSH and YSH over the years. When I get bored of working on OSH, I work on YSH, and vice versa. There's more parallelism now, but I'd like even more.

Oils is Bigger

The slogan I use is: Three Languages Takes More Effort Than One

That is, GNU bash is big, and it corresponds to OSH. But OSH is just one part of the project!

In particular, I didn't realize that mycpp was its own language! It has its own garbage collector.

 

Here's an analogy I made in the project FAQ:

bash : OSH :: GCC : Clang

That is, Clang is a modern re-implementation of GCC, and OSH is a modern re-implementation of POSIX shell and bash.

 

Let's use a table to extend the analogy to three languages:

Shell / Oils Native Code World Description
bash GCC De-facto Standard
OSH Clang Compatible, modern re-implementation
mycpp LLVM Common Infrastructure
YSH Swift or Rust or Cpp2 New Language, reusing the infrastructure

 

And let's call each part of the shell world 10x smaller than its counterpart. (bash is ~142K lines; I think GCC is 1M - 2M lines.)

Even taking into account, Oils is still a huge project! There was less overlap between the parts than I expected, and it takes work make everything fit together.


And that's not all. Recall that the project overview showed 13 parts of the project, not just 3 languages!

Technical Reasons

I gave 3 reasons why Oils is difficult by design, mostly due to "getting off the Bells Labs timeline". Here are some technical reasons that it's taking a long time.

Short Code Takes More Effort to Write Than Long Code

There's a well-known quote from mathematician Blaise Pascal:

I would have written a shorter letter, but I did not have the time.

After writing many blog posts, I feel this viscerally. The short ones take more time and effort to write. I start out with something long, learn what I'm writing about through many revisions, and write it again.


It also applies to code, which is what our middle-out style is all about. Short code takes longer to write.

But I'm glad we made this tradeoff, because: After 8 Years, Oils Is Still Small and Flexible. This paves the way for many more features, like coprocesses and R-like libraries.

That is, we can do whatever we want with the codebase. (I still want to write Comments On Writing Software Over Time, which argues that this isn't the case for most software.)

Speed is Hard, and C is Fast

That's the slogan I use for underestimating the problem of speed. As background, mycpp does two important things:

  1. It "erases" Python's bytecode interpreter, by translating Python to C++
  2. It "erases" dynamic dispatch on PyObject*, by using static types, via MyPy

Those are both good things, but I also learned that:


Related issue: when designing the lossless syntax tree, I didn't know what the mycpp runtime would look like. We took a "pure computer science" approach of just designing the "right" logical data structure.

But some decisions are helped by knowledge of data layout, which in turn depends on the type system. For example, Aidan and I removed the "span ID" concept from the codebase, so we now have a single 24-byte Token object, with an 8-byte GC header.

It would have saved time to start with the latter design, but unfortunately there's no other GC runtime we could have started with.

The Shell Runtime is Complex

I had also proposed: If you can figure out how to parse shell, you can write a shell.

This turned out to be false. Leaving aside GC, there are several more challenges.

For example, Oils 0.23.0 was delayed because I was wrestling with bugs in trap, which involve Unix signals and fork() optimizations.

Shells do different fork() optimizations, and this behavior doesn't appear to be documented anywhere. It's not OK to do zero optimizations, but doing too many causes bugs. Recent versions of bash are still fixing bugs in this area.

Social Reasons

We've seen 3 reasons the project is inherently difficult, and 3 technical reasons that it's taken a long time. Now let's look at social reasons.

Small Pool of Contributors?

Compared to say Clang users and Clang contributors, there may be less overlap between shell users and shell contributors.

Because a shell is not written in shell! In contrast, Clang is written C++, and the Rust compiler is written in Rust.

I'm not sure if this is really an issue, but a few people on Zulip agreed.


Historically, it's also true that bash has had few contributors, compared with other GNU projects.

(Aside: Koichi Murase, author of ble.sh, has contributed to both bash and Oils. Koichi might also disagree that shells aren't written in shell :-) )

Disincentives for Software Interoperability

I was going to write that Nobody makes money from software interoperability. This is one reason it's hard to find people to work on Oils.

But I realized it's worse than that. From word processors to chat apps, it's clear that interoperability can mean losing money!

That is, interoperability is not just neutral — there's a disincentive for it.

 

But it's why I like Unix. Interoperability and composition let you build smaller systems more quickly.

Unix was unfortunately an anomaly: it was created by a telephone monopoly that was banned from competing in the computer industry. That is, they simply did not have the disincentive!

 

As another example of incentives, let's refer to this video from Van Jacobsen, which I linked in The Internet Was Designed With a Narrow Waist:

He describes the pre-Internet networking situation:

There were a lot of vendor protocols

...

The intent of the protocols was to make them all different and unique and to suck in your customers and make sure that they couldn't leave you for some other vendor

That's not the world's greatest architectural principle

So, the Internet's design was another anomaly or outlier.

 

This is a good time to thank NLnet, which has funded many outlier projects, including Oils!

 

Counterpoint: Why does Linux have corporate sponsors? One reason is that many hardware vendors want to commoditize their complement. The natural complement to hardware is an operating system.

Unfortunately, that logic doesn't extend to either GNU bash, which has historically had few resources, or the Oils project.

Oils Was a Personal Research Project

On the other hand, this dynamic creates some space that I enjoy :-) It's a breath of fresh air, and a privilege, not to worry about money when writing software.

When I started the project, there were a bunch of things I wanted to figure out:

  1. Can we statically parse shell?
  2. Can we use Python as a metaprogramming language for C++? Can we write an executable spec for shell in Python-based DSLs?
  3. Can we design a nice language with an upgrade path from bash?
  4. Can we teach YSH to people who don't know any shell?

On the last point, I'm happy that several people have been able to write YSH, despite the fact it's still in progress, and the documentation is basic:

 

So I like to ask big and naive questions, which means it's not that surprising that Oils took awhile.

It's Now a Team Project

I want to express my gratitude for two things:

  1. That NLnet has funded Oils, starting in 2022. I definitely didn't envision this when starting the project, but it turned out to be necessary.
  2. That other, skilled people actually wanted to work on Oils. It could have been that nobody wanted to work on my personal research project! They could be working on their own projects.

And I should have planned on it being a team project. Making three languages takes more effort than making one! Speed is hard, and C is fast.

Nonetheless, we now have a team. I hesitate to single out specific people, since dozens of people have helped over the years, and I appreciate them all. But let me thank these people in particular (in chronological order):

They got deep into the project, and deep into the codebase, in a way that we really needed.


Remember that I credit contributors near the beginning of each release announcement, like Oils 0.22.0 - Docs, Pretty Printing, Nix, and Zsh .

Caveat: Design Bottlenecks

But I still want to improve the project's structure. I've been "heads down" in coding and design, and there are design issues bottlenecked on me, like:

On the one hand, you could argue that I'm "controlling", and not delegating.

On the other hand, there's a rational reason for this. I say that programming languages are a special piece of software. They're inherently tightly-coupled: a change in one part affects many other parts.

For example, adding structured data affected for while if case, "functions", garbage collection, serialization, pretty printing, and more.

It's an O(n2) or worse design problem, where N is the number of features.

 

If language design is "parallelized" too much, you often get a bad result. For example:

"Bolted on" designs allow parallelism. They may be inevitable in big languages, but remember that shell is perhaps 10x smaller than C++. So why not lean toward idealism?

 

The next post will elaborate on this quote:

Language design is a curious mixture of grand ideas and fiddly details

 

I don't claim to have it all figured out. I'd like more feedback from existing contributors!

Help Wanted

I post many tasks that are not bottlenecked on #help-wanted on Zulip!

Please ask questions about them! Remember that I like to ask big and naive questions too :-)


I also mentioned a "catbrain" language design experiment in the project overview. If you want to write quick prototypes toward

A { Forth, Tcl, Lisp } that can express
  { Shell, Awk, Make, find, xargs } and
  { Python, node.js event loop, R data frames } and
  { YAML, Dockerfiles, HTML Templates } and
  { JSON, TSV, S-expressions, ... }

then let me know! Concrete ideas:

I'm just "teasing" this for now, and will publish a repo in the near future.

The Main Message

This post was a bit critical of the project. The rest of the series explains what we got right, and where the project is in 2024:

  1. What Oils Looks Like in 2024
  2. After 8 Years, Oils Is Still Small and Flexible
  3. Garbage Collection Makes YSH Different
  4. A Retrospective of the Oils Project
  5. Oils - Grand Ideas and Fiddly Details
  6. Oils 0.23.0 - User Feedback, Bug Bounty, and Writing YSH Code

It feels like the "high bits" of the project are right:

And this opens up many other parts of the project, which I'm excited about!

What's Next?

I'd like to write about the grants, since they caused a necessary transformation of the project:

In addition to building a team, we made Oils fast, and "figured out" YSH! That means OSH is free to be even more compatible, and work can be parallelized.

Appendix: More Viewpoints

I called this "a retrospective" because there are other viewpoints. The following sections comment on more technical issues.

C++ Type System

I updated this thread for years:

And I summarized it in a lobste.rs comment. Highlights:

Garbage Collection

Related comment on Garbage Collection for Systems Programmers

The Speed of C++ / Rust / Go is Necessary for Language Processors

As mentioned, I was a bit naive about this to start. In other words, the TypeScript compiler is too slow because it's written in TypeScript. TypeScript code runs on v8, which only sees JavaScript code.

It would be better if it were written in a language where static types speed up the code.

This also applies to mycpp itself, which is a bit slow to analyze our code. But it doesn't apply to Oils, because it uses mycpp!