Why Sponsor Oils? | blog | oilshell.org
I recently released Oil 0.8.3, and it's the biggest release in recent memory! What's new?
@
sigil.pass
._match()
to access eggex matches.read
and write
.errexit
overhaul, mentioned in the last
post.This is the first of two posts that describe the language changes. Separately, I plan to write "the ultimate guide" to error handling in shell.
If you're not familiar with Oil, see the new Language Influences and Oil Language Idioms docs, as well as posts tagged #oil-language.
If you're interested in Oil, now is a great time to get involved. Recall that the last post said that OSH would have four significant fixes, but the rest of the project was too much work. The work described here is what I need help with!
Toward the end, I recently updated these pages:
Asking questions and leaving feedback about the language on Zulip is also appreciated! Several people have influenced the language design this way.
The expression language lets you talk about typed data with operators and literals. Let's review those changes first.
Last year, Oil had some "cleanups" of the Python expression language, but I decided that the unfamiliarity isn't worth it. I reverted them, so:
div
is back to //
mod
is back to %
xor
is back to ^
^
is back to **
(The appendix has some rationale for this.)
++
to concatenate, ~~
and !~~
to match globsThe ++
operator is for string and list concatenation. That is, a + b
always does math, and a ++ b
always does concatenation.
This is to support Awk-like auto-type conversion. Similarly, comparison
operators like <
and <=
will only work on numbers, and we'll use a
different syntax for strings. (Yes, I realize the danger with such type
conversion!)
The ~~
and !~~
operators are for glob matching. They deprecate [[ x == *.py ]]
in bash.
{}
, not %{}
This is another return to Python compatibility.
We used sigils like %{foo: 42}
in dict literals because Oil uses { }
for
C-like statement blocks, and it lacks semicolons.
Making the tokens distinct is one way to avoid a subtle parsing issue. This
Hacker News comment about the
Dart language describes some of the difficulties with using {}
in both
expressions and statements.
However, Oil's problem is not as hard as Dart's, and I solved it by simply including newlines in the grammar. A key-value pair can be on a line:
var mydict = {
server: "www.example.com" # optional comma
port: 80
}
But you can't split it across lines
# Syntax error
var mydict = {
server:
"www.example.com"
}
without either ()
or \
:
var mydict = {
# This is valid, or you can use \
server: (
"www.example.com"
)
}
It was bugging me that lists are just [1, 2, 3]
, while dicts were %{key: 'value'}
. This is now fixed!
(Good Zulip Feedback on Line Breaking. I'm still looking for more feedback.)
I also removed the %[]
syntax , which was an overly ambitious idea for typed
array literals. We already have %(one two)
for shell-like arrays, and
['one', 'two']
for Python/JS-like lists.
(Aside: Perl and Ruby have qw(one two)
or qw[one two]
which is like our
%(one two)
.)
&(echo $PWD)
Oil's Ruby-like blocks are "first class". Normally they're passed to procs as the last argument:
cd /tmp {
echo $PWD
}
But we also need them in expression
mode.
I decided on the syntax &(echo $PWD)
.
This may seem inconsistent at first, but it's consistent with command subs:
var b1 = $(echo $PWD) # eagerly evaluated
var b2 = &(echo $PWD) # lazy evaluated
\u{012345}
Character literals stand alone in the expression language, like
var x = \u{3bf} # mu character
That is, you don't need quotes. They're for both "code point literals" ("runes" in Go) and eggex char classes.
This syntax is now consistent within C-escaped strings like $''
and c''
,
and QSN, which leads us into the next section.
Shell has a rich string literal syntax. Oil inherits all of its power, but (as of this release) removes unnecessary flexibility.
Here are some C-style strings:
echo $'C-style'
echo $'\n \i' # single char
echo $'\0123 \x01 \x1' # octal and hex
echo $' \u1234 \U00012345' # unicode
Notes:
\n
is a valid char escape, but \i
is an invalid one. Bash accepts
it and prints \i
literally.\x1
instead of \x01
.I made the following changes to simplify this syntax:
\xHH
.\u{12345}
,
which I added support for.As usual, we do a dance to avoid breaking existing code, while preventing legacy from creeping into the Oil language:
shopt --unset parse_backslash
enables all these syntax
errors. This is the default in bin/oil
(option group oil:all
).bin/osh
.
Legacy shell scripts don't have expressions, so this is OK!Now that we have \u{12345}
, we have an interesting property: any QSN
string is now an Oil string! Though you have to add a $
sigil:
echo $'QSN and Oil \\ \n' # command mode
var mystr = $'\x01 \u{3bf}' # expression mode
var mystr = c'\x01 \u{3bf}' # also valid, opposite of "raw"
Here are some doubled quoted strings:
echo "double quoted"
echo "\$ \i" # invalid escape \i
echo "\\ \ ." # \ missing escape
echo "\$ $ ." # $ missing escape
echo "old: `hostname`, new: $(hostname)" # 2 styles
Oil makes the following changes:
parse_backslash
makes \i
and \
a syntax error. Add the
\
to fix it.parse_dollar
makes $
a syntax error. Ditto.parse_backticks
makes the old command sub style a syntax
error. Use the new style.These options are unset in the option group
oil:all
.
Aside: our lexing style is awesome for making these changes!
I made similar changes to unquoted words.
parse_at_all
Reserves Words Beginning With @
In the oil:basic
option group, we allow this syntax, but we only break the
bare minimum:
echo @myarray
But the oil:all
option group reserves any word beginning with @
, like:
@{} @[] @// @'' @""
This will be useful for future language extensions. That is, creating more syntax errors lets the language evolve.
I also expect shopt --unset parse_dollar
to have this benefit. It allows us
to parse inline eggexes like $/ digit+ /
.
parse_dollar
Again, For StrictnessTo recap:
No:
echo $
echo "$"
Yes:
echo \$
echo "\$"
TODO: We also need to support strict_backslash
in unquoted words.
This post got long, so I split it into two parts. The next part will review changes in Oil keywords, stdlib functions, shell builtins, and documentations.
Let me know what you think of these changes!
One reason to be more Python compatible is that I have a quixotic plan to self-host Oil and expose the metalanguage to users. That is, our DSLs:
should be combined into one language, which I'm calling "Tea".
Against my better judgement, I brought this up on Reddit and on lobste.rs. Briefly, Tea can be described as statically-typed Python with sum types — which someone asked actually for!
And it should have metaprogramming features to express the equivalent of Oil's use of textual code generation.
I wrote a working grammar to design Tea's syntax (*), but that's the only implementation so far. It would be a large project, but it's also a concrete one, because we have 30K-60K+ lines of working code as a use case.
If you want to work on a statically typed language, let me know! I don't know how to write a type checker, and can use help.
Even if Tea doesn't get done, Oil will be useful either way. We can continue using these DSLs for a long time.
(*) The entire language is expressed in the grammar as a big expression, using a single lexer mode. It's nowhere near as complicated as shell!