Why Sponsor Oils? | blog | oilshell.org
On a lobste.rs thread about the rationale for the Fennel language, I posted this summary of why Oil exists:
I think these features alone would justify a new shell:
- Getting rid of "quoting hell"
- Getting rid of ad hoc parsing and splitting
- Fixing errexit
But Oil has a lot more than that, including unifying separate ad hoc expression languages ...
This post elaborates on these points. I've condensed the rationale into four critical features for the OSH language.
I give examples of each feature, link to docs (in progress), and comment on the future of the project.
Recall that OSH is designed to run existing shell scripts, and has done that since early 2018.
It also fixes warts in the shell language with opt-in features. These are the four most important ones.
I just finished an overhaul of shell's flaky set -e
/ errexit
mechanism.
I'm excited by this, because I started it last year, but put it on the back
burner after being stumped!
I believe I've figured out every problem now, and would like your feedback. The simple invariant is that OSH never loses an exit code, which is not true of POSIX shell or bash. Here's a summary of the enhancements:
strict_errexit
- A shell option to detect cases where you would lose
errors in shell, like if myfunc
. This improves your shell scripts, even if
you run them under another shell! In other words, OSH can be used as a dev
tool.inherit_errexit
- OSH implements this bash 4.4 option, which is a partial
fix for the "command sub errexit" problem.command_sub_errexit
- A shell option to check for failure at the end of
every command sub, so you don't lose errors.
process_sub_fail
- Like pipefail
, but for process substitutions. It
allows errexit
to "see" the failure caused by process subs, like the sort
invocation in cat foo.txt <(sort /oops/error)
,
@_process_sub_status
: A variable that's analogous to ${PIPESTATUS[@]}
.
You may want to inspect the exit status of all processes.run
builtin turns errexit
back on, so if run myfunc
is safe. It
also provides fine-grained control over exit codes.Yes, there are many solutions, because shell has many problems! But you don't
have to remember all these names. Add shopt --set oil:basic
to the top of
your program to turn all options. The strict_errexit
failures will remind
you to use the run
wrapper.
(Aside: I was able to fix all these problems cleanly in the interpreter. I spent time a lot of time on Oil's architecture 4 years ago precisely so I could fix such subtle problems. When the code has a good structure, the "right place" for a fix reveals itself to you. Oil is still improving!)
QSN is the foundation for Structured Data in Oil. It removes the need to invent ad hoc (and often broken) formats every time you need to deal with user-supplied data in shell. In other words, Oil scripts have an alternative to messy parsing and splitting.
I just implemented a QSN decoder, after implementing an encoder earlier this year.
Here are some short examples. The write
builtin prints its args to stdout,
and it accepts a --qsn
flag:
# Print filenames ONE PER LINE. If a name contains a
# newline or other special char, it's QSN-encoded like
# 'multi-line \n name with NUL \0 byte'
write --qsn -- *.txt
The read
builtin provides the inverse:
cat list.txt | while read --line --qsn {
# _line is implicitly set by 'read'
rm -- $_line
}
I also implemented read -0
as a synonym for bash's obscure read -r -d ''
.
This allows you to consume find -print0
output in shell, like xargs -0
does. This format is distinct from QSN, but it's now easy to convert back and
forth between them.
This is the first cut of QSN support. I expect it to evolve based on your feedback!
!qefs
problem)This was done in summer 2019. I described it in Simple Word Evaluation earlier this year, and you can see examples in Oil Language Idioms.
Briefly, Oil allows this:
ls @myflags $filename
instead of
ls "${myflags[@]}" "$filename"
Notice the @
splice operator, and lack of quotes.
This blog began in 2016 with an explanation of static parsing. I didn't mention it in the comment quoted in the intro, but it's still a crucial part of the project.
I was reminded how important this is when noticing that the authors of both Perl 5 and the rc shell made complaints about shell's dynamic parsing, going back 20-30 years!
This foundation is still paying dividends. I recently used the static parser to create detailed error messages for command subs:
$ shopt --set errexit command_sub_errexit
$ d=$(date %x)
date: invalid date ‘%x’
d=$(date %x)
^~
[ interactive ]:13: fatal: Command sub exited with status 1 ...
and process subs:
$ shopt --set process_sub_fail
$ cat /dev/null <(sort oops)
sort: cannot read: oops: No such file or directory
cat /dev/null <(sort oops)
^~
[ interactive ]:27: fatal: Exiting with status 2 ...
We point to the location of the failing construct. No other shell does this!
In addition, Travis Everett has worked on a shell dependency bundler which relies on static parsing.
It was indeed useful to explicitly write out rationale for the language. I've done that many times with posts tagged #why-a-new-shell, but explaining it again helps, even after 4 years. The project is evolving and getting crisper.
With the overhaul of errexit
and the QSN decoder, I believe we now have
all the bases for the OSH language covered! These features
will be out with the next release.
The claim is that these four features alone justify a new Unix shell. If we finish the C++ translation, and end the project here, it would be worthwhile.
To repeat, they are:
If you disagree, let me know! I would like to hear what other warts in the shell language need to be fixed or otherwise addressed.
(I'm leaving out the interactive shell here, as I believe the first priority is a better shell for programming and automation. A "cloud shell", if you will.)
Back in January, I was already concerned about the scope of the project. I wrote that the biggest cut to the project would be that Oil would be based on strings, rather than Python-like data types.
Let me update that statement based on these crisp definitions:
local x=mystr
.var x = 42 + a[i] + f(x, y)
. It has a
garbage-collected heap of recursive data structures.So what I'm saying now is that the priority going forward is to polish the OSH language, and put off the Oil language until the hazy future.
That means finishing the translation to C++, hooking up the garbage collector, and writing documentation. It may mean preparing the code to be embedded in another application, like the fish shell. (I've discussed this with the maintainer, and there's some interest. But it's a lot of work, which shouldn't be taken for granted, and there are unsolved problems.)
Achieving this OSH language milestone feels very doable, since everything already works in Python, and something like 915 out of 1685 spec tests pass in C++ (yielding a 30x - 50x speedup).
But I'm not giving up on the Oil language! I just need help. It exists in prototype form, and your feedback will motivate me to work on it.
Here are some blog posts I want to write, to get the word out:
Four Features of the Oil Language. This post narrowed down OSH to four major features, and Oil also has four:
We have working prototypes for every feature except QTSV. You can try them now!
Big Changes to the Oil Language. A list of recent changes I've made, which should give potential contributors a feel for the language.
What Distinguishes Python, JS, and Ruby from Perl and PHP. The former languages have a clean data model / memory model: a garbage collected heap with reference semantics.
The latter languages have warts in their model. Oil adds the clean model to shell.
Comments on Comics. I can use these recent comics as a way to explain the OSH language. (See other posts tagged #comic.)
I described 4 essential features of an improved shell language. Let me know what you think is missing.
If you haven't read it already, see Why Create a Unix Shell?. It's the most popular page on this site, though I still need to update it for 2021.
I then proposed a focus on making the OSH language "production ready". I'm still going to work on the Oil language, but I need help finishing it.
Speaking of which, several people have pointed out that the dev process for Oil
is difficult. I've addressed this recently by removing spew from the build
logs, and adding a lint check for more portable shebangs with /usr/bin/env
.
(This just triggered and prevented a regression!)
I still would like to make a screencast to show how easy Oil is to work on.
After cloning, a 10 or 20 second build process should get you a working
bin/osh
:
~/git/oilshell/oil$ bin/osh -c 'echo hi'
hi
This is a pure Python program, which is very nice for prototyping!
So I'm trying to make Oil more friendly to work on. Reach out if you want to help, and if you run into problems.
catch
builtin (now run
).Those 4 features aren't the only ones in OSH, but I claim they are sufficient!
I fixed other problems with shell, and described them with posts tagged
#real-problems. I've applied that tag to this post, since
command_sub_errexit
fixes the problem that Julia Evans was perplexed by.
I believe Oil will be faster than bash along 3 dimensions: parsing speed, runtime compute speed, and runtime I/O speed. However this may require optimization after finishing the garbage collector, and we're short on hands to do that.
I'd like Oil to have better dev tools: tracing, debugging, and crash dumps. I prototyped a crash dump a couple years ago, but I didn't receive much feedback on it. I think OSH has to be more mature before that's compelling.
Subinterpreters (issue 704) have multiple use cases. There have been many failed attempts to add this feature to CPython, whereas embedded languages like Tcl and Lua make good use of them.
I even think there are good use cases for embedding WebAssembly in Oil, e.g. as mentioned in the July post on regular languages. Another use case is to package and distribute portable dev dependencies, like the CommonMark renderer. This problem has contributed to the dev process friction mentioned above.