Why Sponsor Oils? | blog | oilshell.org
Every time I've released Oils in the last year, I've said I would write a project retrospective.
Readers have been interested, but I usually want to get back to coding and design after an announcement.
So here's one way of looking at the project. The last few posts were mostly positive, so let's now be critical!
I published an overview of the project a few days ago:
We'll start by summarizing the last major reflection on the project:
I mentioned that Oils is a bunch of experiments that kept working. Some of them took two tries, like J8 Notation, but they eventually worked.
On the other hand, here are 2021's wrong turns and regrets:
(I confess I'm still thinking about bootstrapping, like exposing a well-defined IR for mycpp to users. This is the "Yaks" experiment, mentioned in the project overview.)
Taking into account what I wrote in 2021, let's make a fresh assessment. I have nine points to make, which fall in these categories:
This is a way of saying that Unix and C were created in the early 1970's by Thompson and Ritchie, at Bell Labs, and
So, to the extent that we want to get off the "Bell Labs timeline", we don't know how to do it!
This is why Oils has the unusual structure of both OSH and YSH — two shell languages that share the same runtime. We want to provide a smooth, gradual upgrade path — ideally without language wars.
Some readers have suggested: Why not just work on YSH, and de-emphasize OSH? A couple answers:
I'll return to the ideas of compatibility and a gradual upgrade path later. Slogans:
This is the most predictable reason. The first post in 2016 underestimated the size of bash:
Given that my shell is closer to the bash language than the POSIX shell subset ...
We now have 44 K lines of code in OSH, so 10 K lines was not close!
Of course I can be wrong again, but with more experience, I expect there to be a "long tail" of OSH convergence, especially now that YSH is "figured out". This may be 5-10K more lines over a few years. Though it depends on how many people help.
For some color:
bind
builtin.We also need more testing and demos to show off what OSH can do. Samuel and Aidan have done great work in this area recently.
All in all, it's not too surprising that I underestimated bash. Projects have to start with optimism!
Note that the project has "oscillated" between OSH and YSH over the years. When I get bored of working on OSH, I work on YSH, and vice versa. There's more parallelism now, but I'd like even more.
The slogan I use is: Three Languages Takes More Effort Than One
That is, GNU bash is big, and it corresponds to OSH. But OSH is just one part of the project!
In particular, I didn't realize that mycpp was its own language! It has its own garbage collector.
Here's an analogy I made in the project FAQ:
bash : OSH :: GCC : Clang
That is, Clang is a modern re-implementation of GCC, and OSH is a modern re-implementation of POSIX shell and bash.
Let's use a table to extend the analogy to three languages:
Shell / Oils | Native Code World | Description |
bash | GCC | De-facto Standard |
OSH | Clang | Compatible, modern re-implementation |
mycpp | LLVM | Common Infrastructure |
YSH | Swift or Rust or Cpp2 | New Language, reusing the infrastructure |
And let's call each part of the shell world 10x smaller than its counterpart. (bash is ~142K lines; I think GCC is 1M - 2M lines.)
Even taking into account, Oils is still a huge project! There was less overlap between the parts than I expected, and it takes work make everything fit together.
And that's not all. Recall that the project overview showed 13 parts of the project, not just 3 languages!
I gave 3 reasons why Oils is difficult by design, mostly due to "getting off the Bells Labs timeline". Here are some technical reasons that it's taking a long time.
There's a well-known quote from mathematician Blaise Pascal:
I would have written a shorter letter, but I did not have the time.
After writing many blog posts, I feel this viscerally. The short ones take more time and effort to write. I start out with something long, learn what I'm writing about through many revisions, and write it again.
It also applies to code, which is what our middle-out style is all about. Short code takes longer to write.
But I'm glad we made this tradeoff, because: After 8 Years, Oils Is Still Small and Flexible. This paves the way for many more features, like coprocesses and R-like libraries.
That is, we can do whatever we want with the codebase. (I still want to write Comments On Writing Software Over Time, which argues that this isn't the case for most software.)
That's the slogan I use for underestimating the problem of speed. As background, mycpp does two important things:
PyObject*
, by using static types, via MyPyThose are both good things, but I also learned that:
Related issue: when designing the lossless syntax tree, I didn't know what the mycpp runtime would look like. We took a "pure computer science" approach of just designing the "right" logical data structure.
But some decisions are helped by knowledge of data layout, which in turn depends on the type system. For example, Aidan and I removed the "span ID" concept from the codebase, so we now have a single 24-byte Token
object, with an 8-byte GC header.
It would have saved time to start with the latter design, but unfortunately there's no other GC runtime we could have started with.
I had also proposed: If you can figure out how to parse shell, you can write a shell.
This turned out to be false. Leaving aside GC, there are several more challenges.
For example, Oils 0.23.0 was delayed because I was wrestling with bugs in trap
, which involve Unix signals and fork()
optimizations.
Shells do different fork()
optimizations, and this behavior doesn't appear to be documented anywhere. It's not OK to do zero optimizations, but doing too many causes bugs. Recent versions of bash are still fixing bugs in this area.
We've seen 3 reasons the project is inherently difficult, and 3 technical reasons that it's taken a long time. Now let's look at social reasons.
Compared to say Clang users and Clang contributors, there may be less overlap between shell users and shell contributors.
Because a shell is not written in shell! In contrast, Clang is written C++, and the Rust compiler is written in Rust.
I'm not sure if this is really an issue, but a few people on Zulip agreed.
Historically, it's also true that bash has had few contributors, compared with other GNU projects.
(Aside: Koichi Murase, author of ble.sh, has contributed to both bash and Oils. Koichi might also disagree that shells aren't written in shell :-) )
I was going to write that Nobody makes money from software interoperability. This is one reason it's hard to find people to work on Oils.
But I realized it's worse than that. From word processors to chat apps, it's clear that interoperability can mean losing money!
That is, interoperability is not just neutral — there's a disincentive for it.
But it's why I like Unix. Interoperability and composition let you build smaller systems more quickly.
Unix was unfortunately an anomaly: it was created by a telephone monopoly that was banned from competing in the computer industry. That is, they simply did not have the disincentive!
As another example of incentives, let's refer to this video from Van Jacobsen, which I linked in The Internet Was Designed With a Narrow Waist:
He describes the pre-Internet networking situation:
There were a lot of vendor protocols
...
The intent of the protocols was to make them all different and unique and to suck in your customers and make sure that they couldn't leave you for some other vendor
That's not the world's greatest architectural principle
So, the Internet's design was another anomaly or outlier.
This is a good time to thank NLnet, which has funded many outlier projects, including Oils!
Counterpoint: Why does Linux have corporate sponsors? One reason is that many hardware vendors want to commoditize their complement. The natural complement to hardware is an operating system.
Unfortunately, that logic doesn't extend to either GNU bash, which has historically had few resources, or the Oils project.
On the other hand, this dynamic creates some space that I enjoy :-) It's a breath of fresh air, and a privilege, not to worry about money when writing software.
When I started the project, there were a bunch of things I wanted to figure out:
On the last point, I'm happy that several people have been able to write YSH, despite the fact it's still in progress, and the documentation is basic:
So I like to ask big and naive questions, which means it's not that surprising that Oils took awhile.
I want to express my gratitude for two things:
And I should have planned on it being a team project. Making three languages takes more effort than making one! Speed is hard, and C is fast.
Nonetheless, we now have a team. I hesitate to single out specific people, since dozens of people have helped over the years, and I appreciate them all. But let me thank these people in particular (in chronological order):
They got deep into the project, and deep into the codebase, in a way that we really needed.
Remember that I credit contributors near the beginning of each release announcement, like Oils 0.22.0 - Docs, Pretty Printing, Nix, and Zsh .
But I still want to improve the project's structure. I've been "heads down" in coding and design, and there are design issues bottlenecked on me, like:
On the one hand, you could argue that I'm "controlling", and not delegating.
On the other hand, there's a rational reason for this. I say that programming languages are a special piece of software. They're inherently tightly-coupled: a change in one part affects many other parts.
For example, adding structured data affected for while if case
, "functions", garbage collection, serialization, pretty printing, and more.
It's an O(n2) or worse design problem, where N is the number of features.
If language design is "parallelized" too much, you often get a bad result. For example:
"Bolted on" designs allow parallelism. They may be inevitable in big languages, but remember that shell is perhaps 10x smaller than C++. So why not lean toward idealism?
The next post will elaborate on this quote:
Language design is a curious mixture of grand ideas and fiddly details
I don't claim to have it all figured out. I'd like more feedback from existing contributors!
I post many tasks that are not bottlenecked on #help-wanted on Zulip!
Please ask questions about them! Remember that I like to ask big and naive questions too :-)
I also mentioned a "catbrain" language design experiment in the project overview. If you want to write quick prototypes toward
A { Forth, Tcl, Lisp } that can express
{ Shell, Awk, Make, find, xargs } and
{ Python, node.js event loop, R data frames } and
{ YAML, Dockerfiles, HTML Templates } and
{ JSON, TSV, S-expressions, ... }
then let me know! Concrete ideas:
fork()
and the like)I'm just "teasing" this for now, and will publish a repo in the near future.
This post was a bit critical of the project. The rest of the series explains what we got right, and where the project is in 2024:
It feels like the "high bits" of the project are right:
And this opens up many other parts of the project, which I'm excited about!
I'd like to write about the grants, since they caused a necessary transformation of the project:
In addition to building a team, we made Oils fast, and "figured out" YSH! That means OSH is free to be even more compatible, and work can be parallelized.
I called this "a retrospective" because there are other viewpoints. The following sections comment on more technical issues.
I updated this thread for years:
And I summarized it in a lobste.rs comment. Highlights:
signed char
, a choice borrowed from CPython.
fork()
instead, and our GC is fork-friendly, unlike say CPython.Related comment on Garbage Collection for Systems Programmers
As mentioned, I was a bit naive about this to start. In other words, the TypeScript compiler is too slow because it's written in TypeScript. TypeScript code runs on v8, which only sees JavaScript code.
It would be better if it were written in a language where static types speed up the code.
This also applies to mycpp itself, which is a bit slow to analyze our code. But it doesn't apply to Oils, because it uses mycpp!