Why Sponsor Oils? | blog | oilshell.org
This is the latest version of Oils! We're polishing and optimizing OSH, and translating YSH to fast, native code.
(Since the last release, we started slightly renaming the project.)
Oils version 0.15.0 - Source tarballs and documentation.
To build and run it, follow the instructions in INSTALL.txt. The wiki has tips on How To Test OSH.
If you're new to the project, see Why Create a New Shell? and posts tagged #FAQ. I recently wrote the Oils 2023 FAQ.
I'm happy to say that I got a lot of help with this release. The project has grown large, and wouldn't be possible without help. Thanks again to NLnet for funding us.
In roughly chronological order:
Melvin Walls implemented job control.
setpgid()
and tcsetpgrp()
, which Advanced Programming in the Unix Environment (APUE) helped us with.Chris Watkins implemented a pool allocator that's integrated with the garbage collector, and improved our dev tools.
CoffeeTableEspresso helped with the "big parser refactoring", which includes removing the "span ID" concept from the codebase.
Aidan Olsen did a large part of the parser refactoring, implemented history -a
and -r
, and improved our dev tools.
Samuel Hierholzer implemented an oshrc.d
directory, and the --norc
flag.
oshrc
, which we view as a harmful pattern in software distribution.~/.config/oils/oshrc.d
because we still have to finish the Big Renaming.Melvin Walls added a C++ backend to the pgen2 parser generator, which we borrowed from CPython. This means that we now have spec test runs for YSH (formerly Oil)! See the appendix.
The parser refactoring has 3 or 4 parts, and several motivations, which are all on #oil-dev
on Zulip. But one quick way to understand it is to note that finishing the garbage collector "unlocked" many design decisions. We now know how our data structures can be laid out, and how they perform!
(I''d like to write Zephyr ASDL After 6 Years and The Lossless Syntax Tree After 6 Years.)
The March release of Oils 0.14.2 said that polishing OSH includes a process of "conceding to reality".
That is, we're making OSH even more compatible, and making strict behavior opt-in.
A recent report by Simon Michael perfectly demonstrates both the costs and the benefits of strict behavior. OSH complained about this line in his script:
if [[ $HELP = 1 || ${#ARGS} -eq 0 ]]; then usage; exit; fi
^~
inv:69: fatal: Array 'ARGS' can't be referred to as a scalar (without @ or *)
This is actually a bug that OSH flagged! In bash,
${ARGS}
is bizarrely equivalent to ${ARGS[0]}
. (Arrays were hacked onto bash late in its life, and I think adopting this semantic was easier to implement)${#ARGS}
is equivalent to ${#ARGS[0]}
. It's the length of the first string in the array, not the length of the array.If you want to get the length of an array, you have to remember that it's ${#ARGS[@]}
.
Even though I've burned the awkward "${array[@]}"
into my fingers, and wrote the post Thirteen Incorrect Ways and Two Awkward Ways to Use Arrays, I just wrote the same bug.
That is, I used "${PY3_BUILD_DEPS}"
instead of "${PY3_BUILD_DEPS[@]}"
in one of our own bash scripts. It caused a build problem for Chris, which he fixed it in PR 1576.
I swear this wasn't a "setup". (See other posts tagged #real-problems).
shopt --set strict:all
Even though this error was useful, I moved it under shopt --set strict_array
, so OSH behaves like bash by default. Why?
strict_*
options, so this is a more consistent design. (Show all strict options with shopt -p | grep strict
)Later, you can opt in with the strict:all
option group, which includes strict_array
. Or use this snippet to run with both bash and OSH:
shopt --set strict:all 2>/dev/null || true
Samuel tested Arch Linux's makepkg.sh
with OSH, and it appears to mix strings and arrays under the same variable name. OSH doesn't like this right now, but we can also move this behavior to shopt --set strict_array
.
pkgbase=${pkgbase:-${pkgname[0]}}
^~~~~~~
'/usr/sbin/makepkg':1232: fatal: Can't index string 'pkgname' with integer
We can use still use help with changes like this, which are admittedly hairy and obscure. I posted an ad on lobste.rs for a Python/Shell Language engineer last month, and got great responses. Aidan started immediately, and already did a bunch of great work.
I noted in the ad that you don't necessarily need to know C++, although in retrospect, knowing C++ seems to be helpful.
Good news: our C++ spec test delta is down to 10! That is, out of ~1800 tests, the translated C++ shell passes 10 fewer tests than the Python shell.
And we've now accounted for all the differences, but not fixed them. One tricky bug relates to the way we translate Python context managers to C++ constructors and destructors.
You can throw an exception from __exit__
, but you can't throw from a destructor. So we have cases where the C++ runtime aborts the process, instead of throwing and catching an exception. I think there's a pretty simple solution with "out params".
If that last paragraph made sense to you, you should help us with the code! You can even be paid to work on Oils.
I fixed build bugs reported by users on these platforms:
Please try the new oils-for-unix-0.15.0.tar.gz
tarball and let me know what happens!
Here's an auto-generated list of 20 issues fixed in this release. I don't have time to write about everything, and this isn't a complete list. But the point is that OSH is getting a lot better :-)
#1557 | Update known differences doc |
#1555 | Parallelize end user C++ build |
#1552 | `time` builtin `user`/`sys` time always zero |
#1551 | Relax ${array} check by removing shopt -s compat_array and putting it in strict_array |
#1550 | Add ysh symlink to Python tarball |
#1549 | Fix bad $? of -1 on Ctrl-Z |
#1548 | Add oshrc.d and yshrc.d directories; discourage mutating bashrc pattern |
#1547 | Implement ERR hook (run code when errexit happens) |
#1546 | Implement bash-compatible DEBUG hook |
#1536 | Implement history -r |
#1525 | oils-for-unix 0.14.2 build fails on ArchLinux (`typedef` should have been `decltype`?) |
#1523 | getopts behaves incorrectly with multiple -abc -def args |
#1522 | oils-for-unix build failures on OpenBSD due to stdin macro conflict |
#1468 | `cpp/core.h:116:29: error: unknown type name 'sighandler_t'` while building oils-for-unix on macOS |
#1378 | Traps (hooks and signal handlers) should be cleared upon fork() |
#1375 | Make spec/stateful run against oil-native |
#916 | Ctrl-C shouldn't cancel background jobs (setpgid not called, e.g. on pipelines) |
#594 | Generate parse tables for pgen2-native and hook it up to oil-native |
#562 | Implement history -a |
#360 | Implement the rest of job control |
The docs really need an overhaul, which is coming, but I've kept them up to date for this release.
For example, I updated Known Differences Between OSH and Other Shells with notes about job control.
In particular, OSH runs the last part of a pipeline in the shell process where possible, like zsh does. This is shopt -s lastpipe
in bash:
echo hi | read x # is read run in the shell, or in another process?
echo x=$x # does $x contain 'hi', or is it empty?
But this conflicts with job control, so such pipelines can't be suspended, which is also true in zsh. In contrast, bash simply ignores shopt -s lastpipe
in interactive shells. We chose the zsh behavior because you should be able to test OSH and YSH interactively, with confidence.
I also updated these wiki pages to give contributors a sense of the project:
If those pages doesn't scare you off, you're a good person to work on Oils! And you can even be paid.
I've mentioned our CI under the tags #soil and #toil (the first name I used). It's a big "distributed shell script", and I'd like to write in detail about it. But here are some quick updates:
(1) It has a more friendly UI:
(2) It uses 7 or 8 Docker/OCI containers, which build slowly. I started packaging our dev dependencies as "wedges", which are intended to compose with and compose better than OCI layers. This work sped up the build, but it unfortunately caused bugs in the setup process, which Aidan hit.
(3) Soil continues to grow test and benchmark suites which help us design the shell, translate it to C++, and optimize the mycpp runtime.
benchmarks2/uftrace
uses uftrace to count allocations and sizes.benchmarks2/gc-cachegrind
uses Cachegrind to separately measure the time taken by the shell itself, the allocator, free()
, GC rooting, and marking and sweeping. All of these things are expensive!interactive/process-table
tests the PGID
and controlling terminal of child processes across shells.Soil does the equivalent of a release on every commit, including making tarballs. In the future, it should literally be how we make releases.
The garbage collector also needs support from the build system. We now have a three-level structure for our build variants:
(compiler binary, compiler config, optional app config)
with the syntax ninja _bin/$CC-$CONFIG+$APP/osh
.
Examples:
_bin/cxx-opt+bumproot/osh
helped us measure the speedup of the pool allocator.
perf
profiles, because it's done done in every function! It's a cost that's "all over", in both time and space._bin/cxx-asan+gcalways/osh
is a binary that stress tests the GC._bin/clang-coverage/osh
uses Clang's code coverage to give us a nice report.Having all these tools makes it easier to contribute! When reviewing a PR, I look at the tests and benchmarks first. Everything else is generally easy.
Despite all this progress, we're not done. There are more issues after we fix the 10 spec tests. I've been keeping track of hard bugs on Zulip:
(These could be on Github, but I like Zulip for summarizing, linking, and commenting.)
configure
to run as quickly as bash and dash, and correctly too. There are some unexplained differences in the logs. This is hard because autotools-generated code is hard to read and debug.A shell is a stateful process that's concurrent with the stateful kernel, and job control exposed that even further. It's inherently hard to test.
I think of the shell as an event loop which receives input from signals and waitpid(-1)
, which I wrote about in January 2022: The Shell Runtime As a State Machine.
So it would be nice to test it as an explicit state machine, including the error paths. I'm not sure how far we'll go down this path right now, but I'd like to the project to continue raising the bar on software quality.
For example, we use the exhaustive reasoning of regular languages and algebraic data types via ASDL. Explicit state machines are in the same vein.
Even though some of our tests are flaky, they find bugs not just in OSH, but in other shells!
For example, I finally understand the symptom in issue 330 from 2019. Our spec tests would randomly stop like this, especially when run in parallel:
test/spec-runner.sh run-cases prompt
test/spec-runner.sh run-cases quote
[1]+ Stopped test/spec.sh all
And sometimes the parent shell would even disappear, closing the terminal! For years, this happened rarely, but our initial job control implementation made it happen 100% of the time.
Now I understand that:
setpgid()
and tcsetpgrp()
to tell the kernel which processes are in the foreground, and which are in the background. This determines which processes receive signals when you hit Ctrl-C and Ctrl-Z.SIGTTIN
/ SIGTTOU
. The default action for these signals is to stop the process.So if job control isn't done exactly right, processes can stop seemingly at random, especially if many are run in parallel.
It was pretty easy to isolate a couple bugs in OSH. One bug was that we didn't always give the terminal back to the parent's PGID before exiting the shell, e.g. when your oshrc
calls exit
(admittedly rare, but tested!).
After fixing it, the confusing stoppage no longer happens with OSH. But it still happens with bash, or at least the old version we're testing with.
But we still have a different, mysterious problem: sometimes the interactive
suite hangs in the CI forever. It goes on for 30 minutes or more.
I worked around it by making the test suite run serially, not in parallel.
But I'd like to hear from people interested in testing concurrent systems! I don't want to play "Whac-A-Mole" anymore. Help us figure out ways to exhaustively test a shell. Some ideas in the comments here:
vmtest: Run your tests in virtual machines (dxuuu.xyz
via lobste.rs)
9 points, 12 comments on 2023-05-11
For example, we could run the shell with User Mode Linux, which is a real kernel in user space. But what assertions would we make?
In summary, we:
history
So what's next?
The Big Renaming requires many mechanical changes, which I mentioned on the 2023 Roadmap.
I'd like to do all the breaking changes in one release, rather than spreading them out. For example, renaming ~/.config/oil
→ ~/.config/oils
, and internal features like OIL_GC_ON_EXIT=1
→ OILS_GC_ON_EXIT=1
.
I want to re-organize the docs and rewrite the help
builtin, which will pave the way for smooth user contributions.
I've been keeping track of YSH design issues and performance ideas on Zulip. I moved important threads to these new streams:
%symbol
→ :symbol
, which is more conventionalI also had an idea for a new "Squeeze and Freeze" primitive to reduce both GC pressure and memory usage. It has the benefits of an arena, and but it's integrated with the GC, and thus memory safe.
My comment on
Flattening ASTs (and Other Compiler Data Structures) (cs.cornell.edu
via Reddit)
131 points, 25 comments - 01 May 2023
Again, the GC has "unlocked" many design decisions, so we can start thinking of fun stuff like this. But we should do basic optimizations first.
I also want to advertise our #shell-gui Zulip channel, which has had more activity lately. Subhav Ramachandran and I started this work back in 2021, but it's been dormant for 2 years, since both of us had other things to do.
Remember that the project's scope was too big, and I cut out the entire interactive shell. But now we have help, so it's reasonable to think about this again.
The basic idea is the same as these wonderful Arcan FE demos:
The Day of a new Command-Line Interface: Shell (arcan-fe.com
via lobste.rs)
81 points, 18 comments on 2022-04-04
But a crucial difference is that it's compatible with ls --color
and a million other tools. We invented the FANOS protocol to solve this problem: File descriptors And Netstrings Over Sockets.
That is, a GUI and Oils can communicate over a Unix domain socket, which includes file descriptors pointing to a terminal.
This idea really needs diagrams. Maybe you can help us on #shell-gui :-)
Thanks again to all contributors! Let me know if I neglected to mention something, including your contribution. And thank you to everyone who reported bugs -- I've been getting great feedback.
These metrics help me keep track of the project. Let's compare this release with version 0.14.2 from March.
We implemented more features in Python:
More spec tests passed in C++ because of features implemented in Python, like Aidan's history -a -r
. And we're whittling down the remaining translation bugs:
There was some work on YSH behavior:
More significantly, we have our first run of YSH in C++:
The pool allocator made the parser faster:
and it also reduced memory usage (max RSS):
parse.configure-coreutils
1.95 M objects comprising 69.6 MB, max RSS 91.8 MBparse.configure-coreutils
1.97 M objects comprising 73.4 MB, max RSS 81.1 MBSo we're using less memory, but asking for slightly more due to the "big parser refactoring". This is temporary, because the end result of the refactoring will allocate less. It should also shrink Token
objects from 40 to 32 bytes, allowing them to fit inside the pool. These are some of the most common objects in the shell.
This large I/O bound benchmark is also slightly faster, though we still have work to do:
configure
configure
configure
I also added stable performance metrics for the GC:
I created this benchmark after we had trouble reconciling different measurements of the pool allocator.
Significant lines:
I just noticed a bug in the ASDL line counts! I changed comments to be #
like Python and shell, and now we're not excluding comment lines.
We have more code in the oils-for-unix
C++ tarball:
And a larger binary:
The increase is due to
Alloc<T>
longer, and it's specialized for every type. It may be useful to specialize on sizeof(T)
, not just T
.