Why Sponsor Oils? | blog | oilshell.org

Big Changes to the Oil Language

2020-10-31

I recently released Oil 0.8.3, and it's the biggest release in recent memory! What's new?

This is the first of two posts that describe the language changes. Separately, I plan to write "the ultimate guide" to error handling in shell.

If you're not familiar with Oil, see the new Language Influences and Oil Language Idioms docs, as well as posts tagged #oil-language.

Table of Contents
Help Wanted
Operators (Expression Mode)
Return to Python Compatibility
++ to concatenate, ~~ and !~~ to match globs
Literals (Expression Mode)
Dicts are {}, not %{}
Blocks are &(echo $PWD)
Chars are \u{012345}
Tightened Up String Literals
C-Style
A Superset of QSN
Double Quoted
Word Syntax (Command Mode)
parse_at_all Reserves Words Beginning With @
parse_dollar Again, For Strictness
Next
Appendix: The Tea Language

Help Wanted

If you're interested in Oil, now is a great time to get involved. Recall that the last post said that OSH would have four significant fixes, but the rest of the project was too much work. The work described here is what I need help with!

Toward the end, I recently updated these pages:

Asking questions and leaving feedback about the language on Zulip is also appreciated! Several people have influenced the language design this way.

Operators (Expression Mode)

The expression language lets you talk about typed data with operators and literals. Let's review those changes first.

Return to Python Compatibility

Last year, Oil had some "cleanups" of the Python expression language, but I decided that the unfamiliarity isn't worth it. I reverted them, so:

(The appendix has some rationale for this.)

++ to concatenate, ~~ and !~~ to match globs

The ++ operator is for string and list concatenation. That is, a + b always does math, and a ++ b always does concatenation.

This is to support Awk-like auto-type conversion. Similarly, comparison operators like < and <= will only work on numbers, and we'll use a different syntax for strings. (Yes, I realize the danger with such type conversion!)

The ~~ and !~~ operators are for glob matching. They deprecate [[ x == *.py ]] in bash.

Literals (Expression Mode)

Dicts are {}, not %{}

This is another return to Python compatibility.

We used sigils like %{foo: 42} in dict literals because Oil uses { } for C-like statement blocks, and it lacks semicolons.

Making the tokens distinct is one way to avoid a subtle parsing issue. This Hacker News comment about the Dart language describes some of the difficulties with using {} in both expressions and statements.

However, Oil's problem is not as hard as Dart's, and I solved it by simply including newlines in the grammar. A key-value pair can be on a line:

var mydict = {
  server: "www.example.com"  # optional comma
  port: 80
}

But you can't split it across lines

# Syntax error
var mydict = {
  server:
    "www.example.com"
}

without either () or \:

var mydict = {
  # This is valid, or you can use \
  server: (
    "www.example.com"
  )
}

It was bugging me that lists are just [1, 2, 3], while dicts were %{key: 'value'}. This is now fixed!

(Good Zulip Feedback on Line Breaking. I'm still looking for more feedback.)


I also removed the %[] syntax , which was an overly ambitious idea for typed array literals. We already have %(one two) for shell-like arrays, and ['one', 'two'] for Python/JS-like lists.

(Aside: Perl and Ruby have qw(one two) or qw[one two] which is like our %(one two).)

Blocks are &(echo $PWD)

Oil's Ruby-like blocks are "first class". Normally they're passed to procs as the last argument:

cd /tmp {
  echo $PWD
}

But we also need them in expression mode. I decided on the syntax &(echo $PWD).

This may seem inconsistent at first, but it's consistent with command subs:

var b1 = $(echo $PWD)  # eagerly evaluated
var b2 = &(echo $PWD)  # lazy evaluated

Chars are \u{012345}

Character literals stand alone in the expression language, like

var x = \u{3bf}  # mu character

That is, you don't need quotes. They're for both "code point literals" ("runes" in Go) and eggex char classes.

This syntax is now consistent within C-escaped strings like $'' and c'', and QSN, which leads us into the next section.

Tightened Up String Literals

Shell has a rich string literal syntax. Oil inherits all of its power, but (as of this release) removes unnecessary flexibility.

C-Style

Here are some C-style strings:

echo $'C-style'
echo $'\n \i'               # single char 
echo $'\0123 \x01 \x1'      # octal and hex
echo $' \u1234 \U00012345'  # unicode

Notes:

  1. \n is a valid char escape, but \i is an invalid one. Bash accepts it and prints \i literally.
  2. Octal escapes and hex escapes can express exactly the same bytes.
  3. Hex escapes can be abbreviated \x1 instead of \x01.

I made the following changes to simplify this syntax:

  1. Disallow invalid char escapes.
  2. Disallow all octal escapes.
  3. Disallow single char hex escapes. Must be \xHH.
  4. Disallow the two unicode escapes in favor of the QSN/Rust style \u{12345}, which I added support for.

As usual, we do a dance to avoid breaking existing code, while preventing legacy from creeping into the Oil language:

A Superset of QSN

Now that we have \u{12345}, we have an interesting property: any QSN string is now an Oil string! Though you have to add a $ sigil:

echo $'QSN and Oil \\ \n'    # command mode

var mystr = $'\x01 \u{3bf}'  # expression mode
var mystr = c'\x01 \u{3bf}'  # also valid, opposite of "raw"

Double Quoted

Here are some doubled quoted strings:

echo "double quoted"
echo "\$ \i"         # invalid escape \i
echo "\\ \ ."        # \ missing escape
echo "\$ $ ."        # $ missing escape
echo "old: `hostname`, new: $(hostname)"  # 2 styles

Oil makes the following changes:

These options are unset in the option group oil:all.

Aside: our lexing style is awesome for making these changes!

Word Syntax (Command Mode)

I made similar changes to unquoted words.

parse_at_all Reserves Words Beginning With @

In the oil:basic option group, we allow this syntax, but we only break the bare minimum:

echo @myarray

But the oil:all option group reserves any word beginning with @, like:

@{} @[] @// @'' @""

This will be useful for future language extensions. That is, creating more syntax errors lets the language evolve.

I also expect shopt --unset parse_dollar to have this benefit. It allows us to parse inline eggexes like $/ digit+ /.

parse_dollar Again, For Strictness

To recap:

No:

echo $
echo "$"

Yes:

echo \$
echo "\$"

TODO: We also need to support strict_backslash in unquoted words.

Next

This post got long, so I split it into two parts. The next part will review changes in Oil keywords, stdlib functions, shell builtins, and documentations.

Let me know what you think of these changes!

Appendix: The Tea Language

One reason to be more Python compatible is that I have a quixotic plan to self-host Oil and expose the metalanguage to users. That is, our DSLs:

should be combined into one language, which I'm calling "Tea".

Against my better judgement, I brought this up on Reddit and on lobste.rs. Briefly, Tea can be described as statically-typed Python with sum types — which someone asked actually for!

And it should have metaprogramming features to express the equivalent of Oil's use of textual code generation.

I wrote a working grammar to design Tea's syntax (*), but that's the only implementation so far. It would be a large project, but it's also a concrete one, because we have 30K-60K+ lines of working code as a use case.

If you want to work on a statically typed language, let me know! I don't know how to write a type checker, and can use help.

Even if Tea doesn't get done, Oil will be useful either way. We can continue using these DSLs for a long time.


(*) The entire language is expressed in the grammar as a big expression, using a single lexer mode. It's nowhere near as complicated as shell!