Changes to Shell Runtime Semantics

2020-11-10

We're on post 4 of 5 in this series about the Oil 0.8.3 and 0.8.4 releases.

Big Changes to the Oil Language
More Changes to Oil's Syntax
Proposed Changes to Oil's Syntax
Changes to Shell Runtime Semantics. This post. I overhauled shell options in Oil, as well as the behavior of procs.
The Shell Programmer's Guide to errexit. About error handling in shell and Oil.

Table of Contents

A Note on Syntax and Semantics

Shell Options

Overhaul of Option Naming

Aliases Are Off (No Dynamic Parsing)

strict_errexit is Even Stricter

proc and Variable Scope

Procs No Longer Return an Expression

Procs and Shell Functions Are In The Same Namespace

shopt --unset dynamic_scope Inside Procs

setref is the ONLY Way to Use Dynamic Scope in Oil

What's Next?

Appendices

New / Updated Docs

Issues Closed in 0.8.4

Commit Log

Zulip Threads About errexit

A Note on Syntax and Semantics

The distinction between syntax and semantics in the titles above isn't a strict one. It's just a rough way of organizing the changes.

That is, the changes to builtins like pp cell are obviously syntactic, but they have semantics too. On the other hand, this post deals with "hidden" semantic changes, but there is some syntax for them.

For example, the shebang line here is a sign that errexit failures happen more often:

#!/usr/bin/env oil

echo "name = $(hostname)"  # This can fail, unlike in shell
echo 'may not get here'

And the proc keyword is a sign that the rules for variable scope are different:

g=GLOBAL

proc p {
  g=foo  # does NOT modify a global
}

f() {
  g=foo  # DOES modify a global
}

Most of the changes in this post are along these lines.

Aside: One of our language design principles is that syntax and semantics should correspond. That is, similar things should look similar, and different things should look different. I wrote about this 4 years ago: The Five Meanings of #. And What Does ${####} Mean?

Now let's look at shell runtime changes in the last two releases.

Shell Options

The following changes affected shell options, which are manipulated with the shopt builtin. They play a crucial role in the project as the mechanism for gradually upgrading OSH to Oil.

Overhaul of Option Naming

I renamed some options, and many now have the prefixes strict_, simple_, and parse_.
- parse_ options give you more errors at parse time.
- strict_ options give you more errors at runtime.
- simple_ options reduce unnecessary flexibility, and are generally turned on in the oil:all group, but not oil:basic.
I moved options between option groups, and tightened up their definition. Here's the idea:
- strict:all is for when you want to run scripts under both OSH and another shell.
- oil:basic enables new constructs without breaking too much.
- oil:all enables the full Oil language. It's equivalent to using bin/oil.

Prefixes and groups are orthogonal dimensions, because some parse_ options are in the oil:basic group, and some are in oil:all.

See Oil Help Topics for a list of all options.

Aliases Are Off (No Dynamic Parsing)

Aliases inhibit static parsing, and shell-like functions can replace alias in almost all cases.

This advice is now on the Shell Idioms page.

`strict_errexit` is Even Stricter

Oil now flags the problem where you lose errors in local x=$(false), but you don't in x=$(false).

This is also now documented on the Shell Idioms page, and a future post will go into detail.

This found several problems in our own shell scripts, which I fixed with commit 4f67c4883a07b70b7737a84cc8d2d4d0ecb3c040.

`proc` and Variable Scope

Shell-like functions are called "procs" in Oil, and they're one of the most important parts of the language. They compose in unique ways.

Procs No Longer Return an Expression

This was left over from when proc and func were "parallel constructions". (The func keyword is deferred indefinitely, or at least until the Tea Language.)

So proc is now more like a shell function, with its return statement accepting a string that looks like an integer:

proc p {
  var x = 42

  return $x  # Good

  # These would be errors
  return x
  return x + 1
}

Procs and Shell Functions Are In The Same Namespace

The shell language is like a Lisp 2: variables and functions each have their own namespace. (Related StackOverflow thread)

I originally put procs in the variable namespace, but now they're in the function namespace. This is more consistent, we can still reflect on procs with the pp proc builtin, mentioned in the last post.

(This change also cleaned up a lot of code, and made it statically typed.)

`shopt --unset dynamic_scope` Inside Procs

This is a big semantic change that makes shell more like Python and JavaScript. Inside proc, we disable dynamic scope, another concept from Lisp.

That means that we can't read and write variables up the stack — from the caller of a proc, the caller's caller, etc. (This rule is called "dynamic" because variable resolution depends on who the caller of the proc is, which can only be determined at at runtime. Compare it with "lexical" scope.)

One way to think of it is that shopt --unset dynamic_scope "neuters" the power of all these constructs when they're inside proc:

proc p {
  x=X              # assigns to a new LOCAL x
                   # not caller's x or a global x

  export PATH=foo  # both of these assign to new locals
  readonly y=Y     

  : ${z=default}   # new local z
  cat {fd}<in.txt  # new local fd
}

It's surprising that there are so many ways to assign in shell, and that they're not confined to shell functions.

Another way to think about it: these constructs are similar to setvar inside shell functions, but they're like setlocal keyword inside proc.

The rationale is that you should be able to audit your code for nonlocal mutation. This is now possible since such mutation can only be done with setvar and setref, which we talk about next.

You can still use old-style shell functions if you want, but it's discouraged.

f() {
  x=y  # mutates caller's x or global x
}

`setref` is the ONLY Way to Use Dynamic Scope in Oil

Oil repurposes dynamic scope for the "out params" mechanism. C or C# programmers may understand this analogy:

myproc input :out1 :out2  # one input, two out parameters

f(input, &out1, &out2);  // two inputs, two out params

You "return" a value by setting an "out param" with the setref keyword. This is more explicit than shell's dynamic scope.

Before showing the example, I should note that this kind of code is uncommon in shell and Oil programs. If you have 10 procs that use setref, it may be better to shell out to a program in Python that returns JSON.

Example:

proc snooze(prefix, :out) {   # out param
  setref out = "$prefix-ZZZ"
}

proc main {
  var x = 'foo'
  snooze bar :x      # pass a string reference
  echo $x            # now 'bar-ZZZ'
}

main

Notes:

The proc argument :x must start with a colon to be bound to a parameter.
The : prefix in the proc signature sets the nameref flag on the parameter binding. That is, we reuse the declare -n mechanism in bash to implement out params. This keeps Oil's interpreter simple.
The setref construct requires a nameref cell. For example, setref prefix = 42 would be a runtime error.

These rules make for more controlled and readable mutation, while preserving the power of the language.

(As an implementation detail, the parameter name becomes __out, so that the nameref cycle detection doesn't falsely trigger.)

What's Next?

The OSH and Oil languages are starting to feel pretty polished. There are only a few TODOs left on the Oil Language Idioms page.

Let me know if you see anything wrong, or if you have questions about these changes.

Here's what I want to work on next:

Garbage Collection, which will show up in the memory usage benchmarks.
Translating I/O, which will make oil-native to pass more spec tests.
Now that the syntax of the Oil language is settled, we should make the interpreter "not metacircular". This removes its dependence on CPython. Note that we already removed this dependency from the OSH language, which is why it translates to C++.

I'm still looking for help. If you want to work on any of the proposed syntax changes, that would be interesting. I'll help you get started!

Appendices

New / Updated Docs

I published more docs that are ready to read on the /release/$VERSION/ page. A list of docs in progress is on the /release/$VERSION/doc/ page.

Issues Closed in 0.8.4

#860	parse_backslash in ShCommand mode
#853	PROMPT_COMMAND can't see last status ...
#835	Make expression language compatible with Python
#482	Turning off aliases in Oil

Commit Log

Thanks to these contributors for their commits! You can also view the full changelog.

fbee753	Yanis Zafirópulos	[doc] Copy editing, spelling fixes (#855)
d67f9ce	Batuhan Taskaya	[tea] Create AST nodes for 'for' loops (#857)

Zulip Threads About `errexit`

I might take a break from blogging now, so here are threads relevant to the final post in this series. I may also split it up into two posts: one about Oil, and one for programmers using other shells.

errexit redux. A summary of complex design issues.
status builtin proposal. This ended up as the run builtin.

Changes to Shell Runtime Semantics

A Note on Syntax and Semantics

Shell Options

Overhaul of Option Naming

Aliases Are Off (No Dynamic Parsing)

strict_errexit is Even Stricter

proc and Variable Scope

Procs No Longer Return an Expression

Procs and Shell Functions Are In The Same Namespace

shopt --unset dynamic_scope Inside Procs

setref is the ONLY Way to Use Dynamic Scope in Oil

What's Next?

Appendices

New / Updated Docs

Issues Closed in 0.8.4

Commit Log

Zulip Threads About errexit

`strict_errexit` is Even Stricter

`proc` and Variable Scope

`shopt --unset dynamic_scope` Inside Procs

`setref` is the ONLY Way to Use Dynamic Scope in Oil

Zulip Threads About `errexit`