Why Sponsor Oils? | blog | oilshell.org
This is the latest version of Oils, a Unix shell. It's our upgrade path from bash to a better language and runtime:
Oils version 0.19.0 - Source tarballs and documentation.
We're moving toward the fast C++ implementation, so there are two tarballs:
INSTALL.txt
in oil-*.tar.gz
.See README-native.txt
in oils-for-unix-*.tar.gz
.If you're new to the project, see the Oils 2023 FAQ and posts tagged #FAQ.
This announcement should have happened weeks ago!
Version 0.19.0 was released on November 30th. And it contains almost 3 months of work — everything since version 0.18.0 in September.
What's happened lately? These blog posts may answer some of your questions:
In short, we've been deep in the nuts and bolts of YSH. It's been been enhanced in fundamental ways, and this includes breaking changes, based on experience with the language.
This announcement is long, with many details and code samples. I hope it will make YSH less mysterious!
We also got a third grant from NLnet, and are looking for contributors. If the details in this post interest you, then you might be a good person to work on Oils.
These contributions might give you a feel for the work we're doing, and where you can jump in. The codebase is more stable — taking its "final" form — though I still want to make the dev setup more portable.
Aidan Olsen:
List => join()
mylist => join()
(fat arrow) and mutating methods as mylist->pop()
(thin arrow). More on this below.^[1 + 2]
args.ysh
read -N
, with fixes to read -n
std::fmod
Melvin Walls:
Dict<K, V>
as a real hash table!
O(n)
lookup. This was surprisingly OK for awhile, but it's indeed slow for some CPU-bound workloads. The more common I/O-bound workloads don't appear to be affected._Dispatch()
into separate functions, so this hot function doesn't have so many roots.StackRoots
bookkeeping, another speed increase.func
callsEllen Potter:
shopt -s nocasematch
, used by Nix and otherstype -a
- originally reported by Simon Michael
type -a
!GLOBIGNORE
(not yet implemented)The steady trickle of feedback continues to be useful:
tar
create a tarball with errors, but only on some platforms.More acknowledgments:
bar-g
for great testing and feedback.You can also view the full changelog for Oils 0.19.0.
Last year, a few people wanted to help implement the "standard library" for YSH.
Unfortunately, the code wasn't yet ready for that! We had to get rid of the "metacircular hack", and figure out a good style to implement builtin functions.
Now that this is done, please take a look at this list of Python-like Str
, Int
, Float
, List
, Dict
methods, as well as free functions:
We need help with the ones with the red X! As usual, the first step is to write spec tests.
Tests alone are a big contribution, because they force design decisions. Usually we look at what Python and JavaScript do, e.g. with [].index()
and Array.indexOf()
.
We can also use feedback on all the changes below. In particular, the thin arrow ->
vs. fat arrow =>
distinction is pretty unfamiliar, but I think it's justified. At least one person on Zulip likes it, but we can use more feedback.
We're mostly working on YSH, but OSH still gets attention. Repeating some of the above, we implemented these bash features:
shopt -s nocasematch
read -N
, with fixes to read -n
type -a
(resulted in some nice refactoring)Please test OSH on your shell scripts, and let us know what's missing or broken.
Now let's discuss the core of this release: big and breaking changes to YSH. If you want to refresh your memory about the language, these docs may help:
I've updated them for this release.
The biggest feature is an overhaul of procs and funcs. We have a new doc, mentioned in the Winter Status Update:
There's a big table of comparisons:
And there's some practical advice: start with neither procs nor funcs. Then refactor to procs. Add funcs later if you need them.
Why the big update to procs and funcs? Here's some background.
Until this year, YSH was called Oil, and it had a weak form of proc
. The idea was to make a modest language that fixes the "warts" in shell. But
In the summer and fall, Aidan and Melvin implemented func
, and tested it by writing new functions in the standard library.
With this release, procs and funcs have become more powerful, and more consistent with each other, along all these dimensions:
myproc word (x, named=42)
and call f(x, named=42)
...pos
and ...named
proc p(x='foo')
and func f(x='foo')
value.Command
blockSo the language is now very rich! Procs and funcs match our GC data structures and data languages.
The design is largely motivated by the 16 use cases in Sketches of YSH Features (from June).
&myvar
is a value.Place
A nice result of procs having typed params is that I got rid of 2 ugly special-case features.
Shell scripts can use dynamic scope to "return" values by mutating the locals of their caller. Bash goes further with declare -n
"nameref" variables. The more minimal "Oil" tried to clean this up with:
proc p(:out) { ... }
setref
keywordThese are now gone in favor of value.Place
, which is just another typed value. To create one, use an expression like &myline
:
var myline # optional declaration
my-read (&myline) # call proc, passing it a Place
echo result=$myline # => result=foo
The &myline
should look familiar to C programmers, and possibly Rust programmers. To set a place, you use the setValue()
method on the place:
proc my-read (; out_place) {
call out_place->setValue('foo')
}
There could be a keyword like setplace
, but I decided to keep the language simple for now.
You'll see more of value.Place
in the section on read
and json read
. A motivating feature was to allow YSH users to write something like Bourne shell's read myvar
.
In summary, value.Place
generalizes these shell mechanisms:
read
and mapfile
, which set "magic" variables.declare -n
aka nameref variables.
proc
call sitesThe doc on procs and funcs shows that "simple commands" are now very rich. All of these are YSH commands:
cd /tmp
cd /tmp {
echo $PWD
}
cd /tmp (myblock)
other-command ([42, 43], named=true)
other-command ([42, 43], named=false]) {
echo 'block arg'
}
This section describes related changes.
_
is now call
YSH has both command and expressions, and _
was the expression evaluation "command":
var mylist = []
_ mylist->append('foo') # method call, which is an expression
my-command append # compare: shell-like command
I've changed it to a keyword call
, which I think is more readable:
call mylist->append('foo')
(A discarded alternative was two colons, like :: mylist->append('foo')
)
We now have square brackets (shopt --set parse_bracket
) to pass unevaluated expressions to procs:
ls8 /tmp | where [size > 10] # if 'where' were a proc
The above is equivalent to passing a value.Expr
quotation:
var cond = ^[size > 10]
ls8 /tmp | where (cond) # one typed arg
This builds on top of Aidan's work implementing value.Expr
, mentioned above:
var size = 42
var cond = ^[size > 10]
var result = evalExpr(cond) # => true
Lazy arg lists aren't used much now, but I expect them to be common. In addition to filters on streams, they should allow assert [42 === x]
to provide good error messages.
This subtle parsing took a couple tries, but I'm happy with the result!
YSH commands that take a block literal can also take a value.Command
object. These are now two syntaxes for the same thing:
cd /tmp {
echo hi
}
var b = ^(echo hi)
cd /tmp (b)
So we have:
value.Command
quotations ^(echo hi)
- looks like shell's $(echo hi)
value.Expr
quotations ^[size > 10]
- looks like YSH $[size > 10]
The ^
forms won't be common in real YSH code, but they're useful for testing and metaprogramming. Usually, you'll pass literal expressions and blocks.
=>
In the summer, we settled on the thin arrow ->
for method calls:
var last = mylist->pop() # use the return value
call mylist->pop() # throw away the return value
We now also accept =>
for methods, and I want to use it to distinguish pure methods that "transform" and methods that mutate.
This gotcha has always bugged me in Python:
mylist.sort() # sort in place
mystr.strip() # BAD: it throws away the result!
# Strings are immutable.
mystr = mystr.strip() # probably what you meant
In other words, the same syntax is used for wildly different semantics. When I explain it to new programmers, I cringe a bit.
So I propose that in YSH, we have:
call mylist->sort() # sort in place
var mystr = mystr => strip() # transform
Right now ->
and =>
are interchangeable, but I think we should enforce the distinction (and Samuel agreed). Feedback is welcome.
Another thing that fell out pretty easily is using =>
to chain free functions.
Here's an excerpt from the commit that implemented this:
The expression obj => f
attempts to create a value.BoundFunc
, which you then call with obj => f()
.
obj
.
f
.
BuiltinFunc
or (user-defined) Func
, we create a
BoundFunc
.So this behavior makes free functions chain like methods. An example from
spec/ysh-methods.test.sh
shows the benefit. If dictfunc()
returns a dict with keys K1 and K2, then
you could have written this code:
$ echo $[list(dictfunc()) => join('/') => upper()]
K1/K2
The new way is nicer and more consistent:
$ echo $[dictfunc() => list() => join('/') => upper()]
K1/K2
Because =>
can be used for both methods and free functions, it's like "uniform function call" syntax, which I've wanted for many years.
We also parse =>
in function return types, but these values aren't used yet:
func f(x Int) => List[Int] {
return ([x, x + 1]) # parens required around expressions
}
(1) We should probably enforce that funcs are really pure
proc
vs. func
, a design issue that I've written about. I think of these as Perlis-Thompson problems.$?
- a pure evaluator should be faster.(2) Clean up implementation of "closed" vs "open" procs
proc p () { # closed, no params to bind
echo
}
proc p { # open, args are automatically bound
echo
}
The difference can now be expressed with a rest param ...ARGV
.
(3) Unify the runtime representation of value.LiteralBlock
and value.Command
.
(4) ARGV should be a regular variable, rather than using shell's separate "$@"
stack.
func
and value.IO
The interactive shell and the YSH language are converging!
Now that we have functions, we can express a nicer prompt API than bash's $PS1
, which has very "exciting" quoting rules:
$ PS1='\w\$ ' # custom PS1 language
$ PS1='$(echo \w)\$ ' # same thing, note single quotes
# and delayed $() evaluation
In contrast, YSH now uses a func
that takes a value.IO
instance. You can build up a plain old string, using methods like io->promptVal()
:
func renderPrompt(io) {
var parts = []
call parts->append(io->promptval('w')) # pass 'w' for \w
call parts->append(io->promptval('$')) # pass '$' for \$
call parts->append(' ')
return (join(parts))
}
This is "normal code", and it should be better for complex prompts. But YSH still respects $PS1
, so you can copy and paste from existing sources, or use that style if you prefer.
Help Topics:
Several YSH builtins have been changed to use the new style of typed args to procs. These are all breaking changes.
read
takes value.Place
, with default var _reply
The read
builtin has been simplified by optionally accepting a value.Place
. There are now 2 ways to invoke it:
echo hi | read --line # fill in _reply by default
echo reply=$reply # => reply=hi
echo hi | read --line (&x) # fill in this Place, var x
echo x=$x # => x=hi
Likewise with the --all
flag, which reads all of stdin:
echo hi | read --all
echo hi | read --all (&x)
(The --long-flag
style lets you know that you're using YSH features.)
json read
is consistent with read
The json
builtin now follows the same convention:
echo {} | json read # fill in _reply
echo {} | json read (&x) # fill in this Place, var x
append
builtinThe append
builtin no longer takes an arg like :mylist
. Instead, it simply takes a typed arg:
append README.md *.py (mylist) # append strings to mylist
This is equivalent to calling methods on the value.List
:
call mylist->append('README.md')
call mylist->append(glob('*.py'))
# Make it a nested list -- not possible with the command-style
call mylist->append(['typed', 'arg', 42])
error
builtinThe syntax has been tweaked to reflected the new separation between word args and typed args. Old style:
error ("Couldn't find $filename", status=99)
The new style has a word arg, and an optional named arg:
error "Couldn't find $filename"
error "Couldn't find $filename" (status=99)
We're still tweaking the API names for consistency. There's a new YSH Style Guide as well.
snake_case()
→ capWords()
startswith()
→ startsWith()
strip()
and family → trim()
and familyI think this set of APIs:
trim()
trimLeft() trimRight()
trimPrefix() trimSuffix()
could be nicer than Python's:
strip()
lstrip() rstrip()
removeprefix() removesuffix()
var
destructuringYou can now initialize multiple variables at once:
var flag, i = parseArgs(spec, ARGV)
I had disabled that feature because I thought this would be confusing by differing from JavaScript:
var x, y = 1, 2 # YSH
var x = 1, y = 2; # JavaScript
But I think we can simply avoid that usage, writing this instead:
var x = 1
var y = 2
null
initializationSometimes you want to initialize a variable after declaring it with var
. Rather than
var x = null
echo hi | read --line (&x)
You can now leave off the right-hand side:
var x # implicit null
echo hi | read --line (&x)
const
must be at the top levelThe YSH const
keyword inherited its behavior from POSIX shell's readonly
. This is a dynamic check, which works poorly in loops:
$ for x in 1 2; do readonly y=x; done
-bash: y: readonly variable
I decided that dynamic const
is "weak sauce", and if anything, we should have a static const.
For now, we're de-emphasizing const
, so it's illegal inside proc
and func
. You can only use var
.
const
can still be at the top level, since the dynamic check is still useful there: it can prevent source
from clobbering variables. (We'll probably introduce namespaces / modules in the future, so that source
doesn't have this pitfall.)
Thanks to Aidan for feedback on this.
Previously we only had:
setvar x += 3
Now we have all of:
setvar x /= 2
setvar a[i] *= 3
setvar d.key -= 4
The augmented assignment operators are listed in the YSH Table of Contents under Assign Ops. (And now I notice a broken link to fix.)
This is now valid syntax:
var x: Int = f() # colon looks better
But again we don't do anything with the Int
annotation. We may omit the colon in signatures, because they conflict with Julia-like semi-colons:
proc p (word; x Int, y Int; z Int) { # no colons, a bit like Go
echo hi
}
Compared with having both:
proc p (word; x: Int, y: Int; z: Int) { # : and ; noisy?
echo hi
}
This change came from using Egg expressions myself. It adds Python-like keywords, which I think makes capturing more readable.
Old syntax:
var pat = / <d+> / # positional capture
var pat = / <d+ : month> / # named capture
New Syntax:
var pat = / <capture d+> / # positional capture
var pat = / <capture d+ as month> / # named capture
I also reserved syntax for type conversion functions, which are fully implemented in version 0.20.0 (the next release):
var pat = / <capture d+ as month: Int> /
This makes Eggex a bit like C's scanf()
!
0 .. n
, not 0:n
Originally I thought slices a[0:n]
were like ranges 0:n
, but they're different.
Str
or List
..
42.
is no longer a valid float; an explicit 42.0
is required. This prevents ambiguity with ranges like 1..5
.
Inf
and -Inf
if(x > 0)
(no space) in addition to if (x > 0)
, though it's not the recommended style.I pointed out several docs in Oils Winter Status Update > Please Review These Docs.
oilshell.org
.While writing these notes, I noticed that we need iteration to get some features right.
This is a major reason YSH hasn't been fully documented: we need to try it first!
Here's a little retrospective:
setref
and :out
became value.Place
in this release.
$'\n'
wart.)SHELL
blocks, I think we can simply attach value.Proc
to the Hay data structure.What other design issues are there?
Related Zulip threads:
Here are some details on the contributions in the first section.
As mentioned, Melvin implemented a real hash table, inspired by CPython's "Hettinger dict". Compared with the earlier Python dict, it's more compact in memory and preserves insertion order.
A primary motivation for YSH was to be able to round-trip JSON messages without shuffling the keys:
{"z": 99, "y": 42, "x": [3, 2, 1]}
Some references we used:
As a result of these optimizations, we're now beating bash on a couple cases of benchmarks/compute
! I think this is pretty impressive, because our source language is typed and garbage-collected Python, while bash is written in C.
So I have more confidence we can be as fast as bash. It's not clear how much effort it will take, but it should be fun nonetheless :-)
_Dispatch
, Melvin added a report to flag functions with too many GC roots.StackRoots
made the code faster, but the executable code much larger. This is a good tradeoff now.
List<T>
growth policy — how big reallocations are.Dict<K, V>
growth policy, but the results weren't conclusive.
%
with power of 2 index_len_
, but it didn't quite show up in wall time. In contrast, the hashing changes resulted in obvious improvements.mylib.BufWriter()
has some tuning work left, to avoid tiny allocations. We should add some micro-benchmarks.A few years ago, I mentioned a "Tea" experiment for bootstrapping. I implemented a parser for Tea, reusing some of the "Oil" parser.
But this made the code more complex, and the parser now seems like the wrong place to start.
So I've deleted it, and started a #yaks experiment in a separate repo. Yaks is more about reusing the mycpp runtime in a "bottom-up" fashion, with an IR, rather than starting from a parser.
In any case, we no longer have this distraction in the code.
This was a huge release, with changes from September, October, and November!
I showed many code samples, and tried to justify each change. YSH is rapidly improving, but it's not done yet.
What's next? Oils 0.20.0 is well underway, with
Let me know what you think in the comments!
#1759 | Str* raw_input(Str*): Assertion `0' failed |
#1758 | Implement command -V (POSIX compatibility) |
#1732 | Crash When Comparing Functions (and Other Values) |
#1731 | Oils 0.18.0 tarball gives errors when extracting with bsdtar |
#1727 | Error building 0.18.0 on MacOS: std::fmod not found |
#1702 | [breaking] Change _ prefix to 'call' keyword |
#1289 | append builtin can take typed args |
#1112 | Design for Python-like functions in Oil |
#1024 | Implement binding of typed params to procs |
#957 | Implement setvar x -= 1 |
#770 | Support read -N, etc. |
#498 | Provide a prompt hook in bin/ysh |
#259 | type builtin doesn't handle -p/P/a |
These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.18.0.
OSH passes more tests due to the features mentioned above.
It also fails more tests, because at least one of them is unimplemented. But remember that adding failing spec tests are half the battle!
You can write Python, and everything "just works" in C++:
New YSH behavior is reflected in the spec tests:
Some of the new behavior doesn't work in C++, largely due to JSON. This has already been fixed in Oils 0.20.0!
The parser is more efficient, I think due to the growth policy:
Small reduction in memory usage:
parse.configure-coreutils
1.83 M objects comprising 65.0 MB, max RSS 69.3 MBparse.configure-coreutils
1.81 M objects comprising 63.9 MB, max RSS 68.8 MBHuge speedup on Fibonacci due to Melvin's work on Dict<K, V>
and GC rooting:
fib
takes 65.4 million irefs, mut+alloc+free+gcfib
takes 33.1 million irefs, mut+alloc+free+gcI/O bound workloads remain the same speed. But we still have to figure out the delta with bash here:
configure
configure
configure
I improved the accounting of lines between OSH and YSH, which means that OSH went down in size:
There's a bit more code in the oils-for-unix
C++ tarball, much of which is generated:
The compiled binary got much bigger due to inlining GC rooting. This is the tradeoff for the speed increases above:
As mentioned, I have an idea for a "hybrid rooting scheme" to make the code both smaller and faster.