Why Sponsor Oils? | blog | oilshell.org
This is the latest version of Oils, a Unix shell that's our upgrade path from bash to a better language and runtime:
Oils version 0.17.0 - Source tarballs and documentation.
We're moving toward the fast C++ implementation, so there are two tarballs:
INSTALL.txt
in oil-*.tar.gz
.See README-native.txt
in oils-for-unix-*.tar.gz
.If you're new to the project, see the Oils 2023 FAQ and posts tagged #FAQ.
Quick reminder about naming:
What's new? The previous release was Breaking Renames and YSH, and it prepared the codebase to implement YSH. So this release is a checkpoint along the way.
Aidan Olsen implemented core YSH features.
case
statement on typed data, as shown in Sketches of YSH Features.mystr->strip()
.func
keyword and return (expr)
. This is still in progress.Peter Debelak tested the C++ tarball on OS X, and fixed several build issues. This is great work, and I hope to hear from people who have run Oils on OS X and BSDs! Please report bugs if it doesn't work.
It compiles in ~30 seconds, requiring only a C++ compiler and a shell (no Make tool). I plan to publish a screencast of this.
We translated the YSH expression evaluator to C++, which makes it more real. This led to bug fixes, and to tightening up language semantics (described below).
357 of 514 tests pass in C++, compared to 185 of 479 in the previous release.
Reduce the number of GC objects allocated by the interpreter, which made it faster (CPU usage) and smaller (memory usage).
For example, running CPython's configure
went from allocating 3.37 M objects to 2.32M objects, a decrease of 31%. Compared to our December baseline, it's a decrease of 42%.
The biggest win was shortcutting the word evaluator in common cases like bare-word
and 'single quoted'
. We also introduced object pools for frequently created objects.
What's happened lately?
June was supposed to be "the month of docs". I had planned to rewrite the help
builtin and re-organize our documentation. We need a place to record all the changes we're making!
That didn't happen, but I did write five blog posts about the design of YSH. They clarified what exactly we should work on, out of the seven features in YSH:
(Stupid slogan I thought of for Oils: Imagine if bash, Python, and JSON kissed.)
After writing those posts, and seeing Melvin's work on YSH, I realized we need to write more code in YSH, as opposed to typed Python. It's less verbose, and it's a good test of the language.
That is, I hope there will be a small part of YSH that's stable, and a larger part can grow for years. Examples:
max()
, sum()
, any()
can be written in YSH. They are sugar on top of >
, +
and or
.
len()
and Bool()
are "intrinsic".This layering also applies to builtin proc
as well as func
: We should be able to write both a test framework describe
and a flag parser argparse
in YSH itself. You can see examples in Sketches of YSH Features.
Another point of clarity is the runtime relationship between OSH and YSH: their data types are now more distinct.
This is mostly so we can continue to increase bash compatibility ("conceding to reality"), without messing up the semantics of YSH.
(Aside: We've long understood the relationship at parse time: YSH is largely a mutually recursive expression sublanguage weaved into shell. Though Aidan found some good bugs here while implementing the YSH case
statement, so it may change a bit.)
Here's some technical detail on the core data types. Building on Melvin's work, I statically-typed and translated the YSH expression evaluator for this release. It had previously relied on PyObject*
, i.e. the "metacircular hack".
It now uses a central value_t
type, expressed with algebraic data types in Zephyr ASDL. For example, this is what POSIX shell looks like:
value = Undef # for ${x:-default} etc.
| Str(str s) # Everything is a string
This is what bash looks like:
...
| Str(str s)
| BashArray(List[str] strs)
# quirk: a bash array is more like Dict[int, str] !
| BashAssoc(Dict[str, str] d)
...
This is what YSH looks like:
...
| Null # e.g. for JSON
| Int(int i)
| Float(float f)
| List(List[value] items)
| Dict(Dict[str, value] d)
...
# omitted: Eggex, Func, Proc, etc.
The main change is that I thought we would unify sequences and maps:
BashArray
and List
BashAssoc
and Dict
But again, I don't want future OSH quirks to affect YSH. Practically speaking, what this means is that OSH and YSH have mostly separate types and operations. You use YSH operations with YSH types:
$ var mylist = ['README', 'foo.py'] # value.List
$ echo @mylist # YSH splice works
README foo.py
$ echo "${mylist[@]}" # bash splice doesn't apply
echo "${mylist[@]}"
^~
[ interactive ]:15: fatal: Invalid type value.List: ...
... Can't substitute into word
You also use OSH operations with OSH types. This shouldn't be a big deal because the most common Str
type is shared and thus seamlessly interoperable.
$ declare -a array=(README foo.py) # value.BashArray
$ echo "${array[@]}" # bash splice works
README foo.py
$ var item = array[0] # YSH array indexing doesn't apply
var item = array[0]
^~~
[ interactive ]:21: fatal: Invalid type value.BashArray: ...
... subscript expected Str, List, or Dict
Also, features like param passing will "just work". You can copy from bash arrays to YSH lists, and vice versa.
The exact set of valid operations on each type can be tweaked based on usage, but we're no longer aiming to "complete the matrix". The interactions are more controlled.
Note that you can write these two styles of syntax in the same file. It's not recommended for new programs, but it may be useful when upgrading from OSH to YSH.
So we're deep in the middle of implementing YSH, and it's taking a nice shape. What are the remaining risks?
Let's look back 3 years to Technical Issues and Risks (2020). We're past the issues I enumerated:
"The main risk is memory management". Funded by our first NLnet grant, I got our garbage-collected runtime working in January, with essential help from Jesse Hughes.
It's also fair to say that the project's small amount of C++ code was a big mess before the first grant. The whole translation process was an experiment — almost a research prototype. But we're now past that phase.
"I'm deferring all the issues related to the interactive shell". I was trying to reduce scope of the project, since I knew it was too big.
Starting late last year, Melvin Walls pretty much single-handedly revived it, again funded by NLnet. This reminds me that I need to write a blog post with screencasts showing our interactive shell in pure C++. It can run virtualenv
, bash completion, git prompts, and more.
pgen2
parser generator. Melvin translated this to C++ a few months ago.
Remove the "metacircular hack". Melvin and I solved most of this problem. As of this release, it's more than halfway done. There's more work, but we'll finish it.
So it's clear that our two NLnet grants (April 2022 and February 2023) have been critical. The project really needs concentrated attention. I welcome casual contribution, and I want to increase it, but we also need sustained contribution.
(As always, you're welcome to join https://oilshell.zulipchat.com/ and ask questions!)
So the main risks are that we won't have enough help, or that our funding runs out. There's a lifetime limit of 4 grants from NLnet, which definitely seems like enough to get the project off the ground, but we shouldn't take it for granted.
A related issue is that I've been "heads down" for a couple months, deep in the design of YSH. And I expect to be deep in documentation for the next month. But I also want to work on finding more people to work on the project.
I'm thinking of writing a blog post How are programming languages funded? I've noticed a common misconception that Python was Guido van Rossum's hobby project. This isn't true, since it's had a small amount of funding for most of its life, including from the US government early on.
So the "administrative" parts of a project definitely matter. A little funding goes a long way.
Another question that's on my mind:
Can YSH be a bounded design?
That is, can there be a stable core that supports "infinite" growth? This is essentially the idea behind the narrow waist blog posts.
It seems like it, but the only way to find out is to implement YSH. Luckily, this seems very feasible. I'm happy that the ysh/expr_eval.py
file is only 1435 lines after static typing! That means we no longer depend on the Python interpreter, so its weight "doesn't count".
This brings to mind another risk:
Is the language too big?
Does it make sense to stuff together all this functionality from shell, Python, JSON, and TSV together? Is it too big to document?
To be honest, it certainly feels big, because it's a lot of work.
But the whole program is still small! I would say it's really small for the amount of work it does.
This is pretty surprising! We'll have a shell with much more power and functionality than bash, at less than half the weight. I'll publish updated line counts when the interpreter is fully translated to C++.
Programmers adopt platforms, not languages.
I want the project to be self-sustaining, and language projects rarely are. What we really care about is operating systems and platforms (Unix, the web, the cloud, etc.)
Shell is interesting because it's arguably the language that's closest to the Unix operating system.
I may write a separate post about this. I guess the bottom line is that we still need to do things with YSH. We are overflowing with ideas, but again short on people.
Again, feel free join us on Zulip. Most people find it "dense", but asking the right questions is a great way to spread knowledge. The codebase is taking its "final" shape as well, so it should be easier to change.
The issues represent some of the work we did:
#1658 | --gc-sections not supported by ld on macos |
#1657 | READLINE_DIR is not used in build/ninja-rules-cpp.sh |
#1656 | HOST_NAME_MAX doesn't appear to be defined in macos |
#1643 | case NEWLINE crashes because newline accepted as pattern |
#1092 | Crash in ${a[0]} array evaluation |
#954 | ${x:-default} when x is an integer fails with NotImplementedError |
#840 | Bug in integer / string conversion |
#741 | Fully nested data structures |
#636 | Oil expression evaluator shouldn't be "metacircular" |
As mentioned, I want to overhaul the help
builtin and documentation. It's great to have contributors working on YSH while that happens. A few years ago, progress on the code would grind to a halt whenever I wrote docs or a blog post!
And I need to do more on the "administrative" side of the project, which is easy to neglect. Please sponsor us if you appreciate this work. We use the money to onboard contributors before they're added to the grant:
Lower priority:
These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.16.0 from June.
Not much work on OSH:
But I did fix a slight regression in translation:
Lots of work on YSH:
Especially making it work in C++, mentioned above:
Parsing speed remained the same, despite some changes for YSH:
Also no change:
parse.configure-coreutils
1.83 M objects comprising 62.1 MB, max RSS 69.1 MBparse.configure-coreutils
1.83 M objects comprising 62.1 MB, max RSS 68.9 MBMany fewer allocations on a real workload:
configure
Which appeared as a big speedup on the ex.compute-fib
benchmark:
Wall times:
configure
configure
configure
. We still have to catch up with bash.I need to update these metrics to include YSH as well as OSH:
We translated more of YSH, resulting in more C++ code in the tarball:
And more executable code: