Oil Fixes Shell's Error Handling (errexit)

Oil is unlike other shells:

This document explains how Oil makes these guarantees. We first review shell error handling, and discuss its fundamental problems. Then we show idiomatic Oil code, and look under the hood at the underlying mechanisms.

Table of Contents
Review of Shell Error Handling Mechanisms
POSIX Shell
Bash
Fundamental Problems
When Is $? Set?
What Does $? Mean?
The Meaning of if
Design Mistake: The Disabled errexit Quirk
Oil Error Handling: The Big Picture
Oil Fails On Every Error
try Handles Command and Expression Errors
boolstatus Enforces 0 or 1 Status
FAQ on Language Design
Reference: Global Options
command_sub_errexit Adds More Errors
process_sub_fail Is Analogous to pipefail
strict_errexit Flags Two Problems
sigpipe_status_ok Ignores an Issue With pipefail
verbose_errexit
FAQ on Options
Summary
Related Docs
Appendices
List Of Pitfalls
Disabled errexit Quirk / if myfunc Pitfall
The Meta Pitfall
Quirky Behavior of $?
Acknowledgments

Review of Shell Error Handling Mechanisms

POSIX shell has fundamental problems with error handling. With set -e aka errexit, you're damned if you do and damned if you don't.

GNU bash fixes some of the problems, but adds its own, e.g. with respect to process subs, command subs, and assignment builtins.

Oil fixes all the problems by adding new builtin commands, special variables, and global options. But you see a simple interface with try and _status.

Let's review a few concepts before discussing Oil.

POSIX Shell

These mechanisms are fundamentally incomplete.

Bash

Bash improves error handling for pipelines like ls /bad | wc.

But there are still places where bash will lose an exit code.

 

Fundamental Problems

Let's look at four fundamental issues with shell error handling. They underlie the nine shell pitfalls enumerated in the appendix.

When Is $? Set?

Each external process and shell builtin has one exit status. But the definition of $? is obscure: it's tied to the pipeline rule in the POSIX shell grammar, which does not correspond to a single process or builtin.

We saw that pipefail fixes one case:

ls /nonexistent | wc   # 2 processes, 2 exit codes, but just one $?

But there are others:

local x=$(false)                 # 2 exit codes, but just one $?
diff <(sort left) <(sort right)  # 3 exit codes, but just one $?

This issue means that shell scripts fundamentally lose errors. The language is unreliable.

What Does $? Mean?

Each process or builtin decides the meaning of its exit status independently. Here are two common choices:

  1. The Failure Paradigm
  2. The Boolean Paradigm

Oil's new error handling constructs deal with this fundamental inconsistency.

The Meaning of if

Shell's if statement tests whether a command exits zero or non-zero:

if grep class *.py; then
  echo 'found class'
else
  echo 'not found'  # is this true?
fi

So while you'd expect if to work in the boolean paradigm, it's closer to the failure paradigm. This means that using if with certain commands can cause the Error or False Pitfall:

if grep 'class\(' *.py; then  # grep syntax error, status 2
  echo 'found class('
else
  echo 'not found is a lie'
fi
# => grep: Unmatched ( or \(
# => not found is a lie

That is, the else clause conflates grep's error status 2 and false status 1.

Strangely enough, I encountered this pitfall while trying to disallow shell's error handling pitfalls in Oil! I describe this in another appendix as the "meta pitfall".

Design Mistake: The Disabled errexit Quirk

There's more bad news about the design of shell's if statement. It's subject to the Disabled errexit Quirk, which means when you use a shell function in a conditional context, errors are unexpectedly ignored.

That is, while if ls /tmp is useful, if my-ls-function /tmp should be avoided. It yields surprising results.

I call this the if myfunc Pitfall, and show an example in the appendix.

We can't fix this decades-old bug in shell. Instead we disallow dangerous code with strict_errexit, and add new error handling mechanisms.

 

Oil Error Handling: The Big Picture

We've reviewed how POSIX shell and bash work, and showed fundamental problems with the shell language.

But when you're using Oil, you don't have to worry about any of this!

Oil Fails On Every Error

This means you don't have to explicitly check for errors. Examples:

shopt --set oil:upgrade     # Enable good error handling in bin/osh
                            # It's the default in bin/oil.
shopt --set strict_errexit  # Disallow bad shell error handling.
                            # Also the default in bin/oil.

local date=$(date X)        # 'date' failure is fatal
# => date: invalid date 'X' 

echo $(date X)              # ditto

echo $(date X) $(ls > F)    # 'ls' isn't executed; 'date' fails first

ls /bad | wc                # 'ls' failure is fatal

diff <(sort A) <(sort B)    # 'sort' failure is fatal

On the other hand, you won't experience this problem caused by pipefail:

yes | head                 # doesn't fail due to SIGPIPE

The details are explained below.

try Handles Command and Expression Errors

You may want to handle failure instead of aborting the shell. In this case, use the try builtin and inspect the _status variable it sets.

try {                 # try takes a block of commands
  ls /etc
  ls /BAD             # it stops at the first failure
  ls /lib
}                     # After try, $? is always 0
if (_status !== 0) {  # Now check _status
  echo 'failed'
}

Note that:

You can omit { } when invoking a single command. Here's how to invoke a function without the if myfunc Pitfall:

try myfunc            # Unlike 'myfunc', doesn't abort on error
if (_status !== 0) {
  echo 'failed'
}

You also have fine-grained control over every process in a pipeline:

try {
  ls /bad | wc
}
write -- @_pipeline_status  # every exit status

And each process substitution:

try {
  diff <(sort left.txt) <(sort right.txt)
}
write -- @_process_sub_status  # every exit status

 

See Oil vs. Shell Idioms > Error Handling for more examples.

 

Certain expressions produce fatal errors, like:

var x = 42 / 0  # divide by zero will abort shell

The try builtin also handles them:

try {
   var x = 42 / 0
}
if (_status !== 0) {
  echo 'divide by zero'
}

More examples:

Such expression evaluation errors result in status 3, which is an arbitrary non-zero status that's not used by other shells. Status 2 is generally for syntax errors and status 1 is for most runtime failures.

boolstatus Enforces 0 or 1 Status

The boolstatus builtin addresses the Error or False Pitfall:

if boolstatus grep 'class' *.py {  # may abort the program
  echo 'found'      # status 0 means 'found'
} else {
  echo 'not found'  # status 1 means 'not found'
}

Rather than confusing error with false, boolstatus will abort the program if grep doesn't return 0 or 1.

You can think of this as a shortcut for

try grep 'class' *.py
case $_status {
  (0) echo 'found'
      ;;
  (1) echo 'not found'
      ;;
  (*) echo 'fatal'
      exit $_status
      ;;
}

FAQ on Language Design

Why is there try but no catch?

First, it offers more flexibility:

Second, it makes the language smaller:

Another way to remember this is that there are three parts to handling an error, each of which has independent choices:

  1. Does try take a simple command or a block? For example, try ls versus try { ls; var x = 42 / n }
  2. Which status do you want to inspect?
  3. Inspect it with if or case? As mentioned, boolstatus is a special case of try / case.

Why is _status different from $?

This avoids special cases in the interpreter for try, which is again a builtin that takes a block.

The exit status of try is always 0. If it returned a non-zero status, the errexit rule would trigger, and you wouldn't be able to handle the error!

Generally, errors occur inside blocks, not outside.

Again, idiomatic Oil scripts never look at $?, which is only used to trigger shell's errexit rule. Instead they invoke try and inspect _status when they want to handle errors.

Why boolstatus? Can't you just change what if means in Oil?

I've learned the hard way that when there's a shell semantics change, there must be a syntax change. In general, you should be able to read code on its own, without context.

Readers shouldn't have to constantly look up whether oil:upgrade is on. There are some cases where this is necessary, but it should be minimized.

Also, both if foo and if boolstatus foo are useful in idiomatic Oil code.

 

Most users can skip to the summary. You don't need to know all the details to use Oil.

 

Reference: Global Options

Under the hood, we implement the errexit option from POSIX, bash options like pipefail and inherit_errexit, and add more options of our own. They're all hidden behind option groups like strict:all and oil:upgrade.

The following sections explain Oil's new options.

command_sub_errexit Adds More Errors

In all Bourne shells, the status of command subs is lost, so errors are ignored (details in the appendix). For example:

echo $(date X) $(date Y)  # 2 failures, both ignored
echo                      # program continues

The command_sub_errexit option makes both date invocations an an error. The status $? of the parent echo command will be 1, so if errexit is on, the shell will abort.

(Other shells should implement command_sub_errexit!)

process_sub_fail Is Analogous to pipefail

Similarly, in this example, sort will fail if the file doesn't exist.

diff <(sort left.txt) <(sort right.txt)  # any failures are ignored

But there's no way to see this error in bash. Oil adds process_sub_fail, which folds the failure into $? so errexit can do its job.

You can also inspect the special _process_sub_status array variable to implement custom error logic.

strict_errexit Flags Two Problems

Like other strict_* options, Oil's strict_errexit improves your shell programs, even if you run them under another shell like bash! It's like a linter at runtime, so it can catch things that ShellCheck can't.

strict_errexit disallows code that exhibits these problems:

  1. The if myfunc` Pitfall
  2. The local x=$(false) Pitfall

See the appendix for examples of each.

Rules to Prevent the if myfunc Pitfall

In any conditional context, strict_errexit disallows:

  1. All commands except ((, [[, and some simple commands (e.g. echo foo).
  2. Function/proc invocations (which are a special case of simple commands.)
  3. Command sub and process sub (shopt --unset allow_csub_psub)

This means that you should check the exit status of functions and pipeline differently. See Does a Function Succeed?, Does a Pipeline Succeed?, and other Oil vs. Shell Idioms.

Rule to Prevent the local x=$(false) Pitfall

No:

local x=$(false)

Yes:

var x = $(false)   # Oil style

local x            # Shell style
x=$(false)

sigpipe_status_ok Ignores an Issue With pipefail

When you turn on pipefail, you may inadvertently run into this behavior:

yes | head
# => y
# ...

echo ${PIPESTATUS[@]}
# => 141 0

That is, head closes the pipe after 10 lines, causing the yes command to fail with SIGPIPE status 141.

This error shouldn't be fatal, so OSH has a sigpipe_status_ok option, which is on by default in Oil.

verbose_errexit

When verbose_errexit is on, the shell prints errors to stderr when the errexit rule is triggered.

FAQ on Options

Why is there no _command_sub_status? And why is command_sub_errexit named differently than process_sub_fail and pipefail?

Command subs are executed serially, while process subs and pipeline parts run in parallel.

So a command sub can "abort" its parent command, setting $? immediately. The parallel constructs must wait until all parts are done and save statuses in an array. Afterward, they determine $? based on the value of pipefail and process_sub_fail.

Why are strict_errexit and command_sub_errexit different options?

Because shopt --set strict:all can be used to improve scripts that are run under other shells like bash. It's like a runtime linter that disallows dangerous constructs.

On the other hand, if you write code with command_sub_errexit on, it's impossible to get the same failures under bash. So command_sub_errexit is not a strict_* option, and it's meant for code that runs only under Oil.

What's the difference between bash's inherit_errexit and Oil's command_sub_errexit? Don't they both relate to command subs?

 

Summary

Oil uses three mechanisms to fix error handling once and for all.

It has two new builtins that relate to errors:

  1. try lets you explicitly handle errors when errexit is on.
  2. boolstatus enforces a true/false meaning. (This builtin is less common).

It has three special variables:

  1. The _status integer, which is set by try.
  2. The _pipeline_status array (another name for bash's PIPESTATUS)
  3. The _process_sub_status array for process substitutions.

Finally, it supports all of these global options:

When using bin/osh, set all options at once with shopt --set oil:upgrade strict:all. Or use bin/oil, where they're set by default.

Related Docs

Good articles on errexit:

Spec Test Suites:

These docs aren't about error handling, but they're also painstaking backward-compatible overhauls of shell!

For reference, this work on error handling was described in Four Features That Justify a New Unix Shell (October 2020). Since then, we changed try and _status to be more powerful and general.

 

Appendices

List Of Pitfalls

We mentioned some of these pitfalls:

  1. The if myfunc Pitfall, caused by the Disabled errexit Quirk (strict_errexit)
  2. The local x=$(false) Pitfall (strict_errexit)
  3. The Error or False Pitfall (boolstatus, try / case)
  4. The Process Sub Pitfall (process_sub_fail and _process_sub_status)
  5. The yes | head Pitfall (sigpipe_status_ok)

There are two pitfalls related to command subs:

  1. The echo $(false) Pitfall (command_sub_errexit)
  2. Bash's inherit_errexit pitfall.

Here are two more pitfalls that don't require changes to Oil:

  1. The Trailing && Pitfall
  2. The surprising return value of (( i++ )), let, expr, etc.

Example of inherit_errexit Pitfall

In bash, errexit is disabled in command sub child processes:

set -e
shopt -s inherit_errexit  # needed to avoid 'touch two'
echo $(touch one; false; touch two)

Without the option, it will touch both files, even though there is a failure false after the first.

Bash has a grammatical quirk with set -o failglob

This isn't a pitfall, but a quirk that also relates to errors and shell's grammar. Recall that the definition of $? is tied to the grammar.

Consider this program:

set -o failglob
echo *.ZZ        # no files match
echo status=$?   # show failure
# => status=1

This is the same program with a newline replaced by a semicolon:

set -o failglob

# Surprisingly, bash doesn't execute what's after ; 
echo *.ZZ; echo status=$?
# => (no output)

But it behaves differently. This is because newlines and semicolons are handled in different productions of the grammar, and produce distinct syntax trees.

(A related quirk is that this same difference can affect the number of processes that shells start!)

Disabled errexit Quirk / if myfunc Pitfall

This quirk is a bad interaction between the if statement, shell functions, and errexit. It's a mistake in the design of the shell language. Example:

set -o errexit     # don't ignore errors

myfunc() {
  ls /bad          # fails with status 1
  echo 'should not get here'
}

myfunc  # Good: script aborts before echo
# => ls: '/bad': no such file or directory

if myfunc; then  # Surprise!  It behaves differently in a condition.
  echo OK
fi
# => ls: '/bad': no such file or directory
# => should not get here

We see "should not get here" because the shell silently disables errexit while executing the condition of if. This relates to the fundamental problems above:

  1. Does the function use the failure paradigm or the boolean paradigm?
  2. if tests a single exit status, but every command in a function has an exit status. Which one should we consider?

This quirk occurs in all conditional contexts:

  1. The condition of the if, while, and until constructs
  2. A command/pipeline prefixed by ! (negation)
  3. Every clause in || and && except the last.

The Meta Pitfall

I encountered the Error or False Pitfall while trying to disallow other error handling pitfalls! The meta pitfall arises from a combination of the issues discussed:

  1. The if statement tests for zero or non-zero status.
  2. The condition of an if may start child processes. For example, in if myfunc | grep foo, the myfunc invocation must be run in a subshell.
  3. You may want an external process to use the boolean paradigm, and that includes the shell itself. When any of the strict_ options encounters bad code, it aborts the shell with error status 1, not boolean false 1.

The result of this fundamental issue is that strict_errexit is quite strict. On the other hand, the resulting style is straightforward and explicit. Earlier attempts allowed code that is too subtle.

Quirky Behavior of $?

This is a different way of summarizing the information above.

Simple commands have an obvious behavior:

echo hi           # $? is 0
false             # $? is 1

But the parent process loses errors from failed command subs:

echo $(false)     # $? is 0
                  # Oil makes it fail with command_sub_errexit

Surprisingly, bare assignments take on the value of any command subs:

x=$(false)        # $? is 1 -- we did NOT lose the exit code

But assignment builtins have the problem again:

local x=$(false)  # $? is 0 -- exit code is clobbered
                  # disallowed by Oil's strict_errexit

So shell is confusing and inconsistent, but Oil fixes all these problems. You never lose the exit code of false.

 

Acknowledgments


Generated on Wed May 3 15:38:09 EDT 2023