Why Sponsor Oils? | blog | oilshell.org

Problems With the test Builtin: What Does -a Mean?

2017-08-31

I recently implemented the test builtin, also known as [. Since I had already implemented its statically-parsed cousin [[, I thought that this task would be straightforward.

But, as always, shell is full of surprises. In this post, I describe fundamental problems with the design of the [ builtin. You can consider this another episode of Shell: The Bad Parts.

To be concrete, what does this expression mean?

$ [ -a -a -a -a ]
Table of Contents
Background
The Prefix Operator -a
Three Expressions and Three Meanings of -a
What Does [ -a -a -a -a ] Mean?
POSIX Uses Brute Force
A Style Guideline
Conclusion
Appendix A: More Differences between [ and [[
Appendix B: Why Does OSH Need [ ?

Background

Recall the difference between [ and [[ from October:

A shell builtin has the same interface as an external command: it receives an argv array and returns an exit code. So [ must parse the expression after variables are substituted and quotes are processed. In other words, it does dynamic parsing.

In contrast, [[ is part of the language, so it can "see" quoting on tokens. This means that it can solve the ambiguity problems with [ that I show below.

Here is a statically-parsed [[ expression:

$ path=/etc/passwd
> [[ -n $path && (! -L $path || $path -nt /etc/other) ]]
> echo $?
0

It may be more readable in this C-like syntax:

nonempty(path) && (!isSymlink(path) || newerThan(path, "/etc/other"))

The [ version is almost the same:

$ path=/etc/passwd
> [ -n "$path" -a '(' ! -L "$path" -o "$path" -nt /etc/other ')' ]
> echo $?
0

except:

(1) Each token in the expression must be a separate element of the argv array. This means:

(2) -o and -a are used for logical or and and. In contrast, [[ can reuses the shell operators || and &&.

Alert readers may already see why the [ language has ambiguous expressions.

The Prefix Operator -a

In bash and ksh, the -a operator is an alias for -e, which returns 0 (true) if and only if its path argument exists:

$ [ -a / ]; echo $?
0
$ [ -a /oops ]; echo $?
1

Three Expressions and Three Meanings of -a

Now I'll show some pathological examples. Although such examples are contrived to be the worst case, I've found them in wild.

Also, these ambiguities lead to a bad class of bug: data-dependent bugs that occur only 0.01% of the time. Bugs like this tend to escape testing.

So, what does -a mean in these 3 expressions?

To decipher them, let's make these definitions:

mystr='-a'
otherstr='-a'
mypath='-a'

Because [ is a builtin, these 3 expressions are identical to the 3 above:

So -a means 3 different things, depending on the context:

  1. A literal string (which may be a path)
  2. Unary operator: alias for -e, to test if a file exists
  3. Binary Operator: logical and

Not only does this make code hard to read, it also makes it difficult to write a correct parser for [.

Note that [ isn't the only command with this type of problem. The find and expr tools are also expression languages with no lexer, and thus have related ambiguity issues. I may write about them in the future.

Another way to think about it: If Python had no distinction between strings and keywords, you wouldn't be able to tell these two expression apart:

>>> 'and' and 'and'  # A valid expression in Python
'and'
>>> and and and      # SyntaxError

What Does [ -a -a -a -a ] Mean?

In bash, it's a syntax error:

$ [ -a -a -a -a ]
... no output ...

But you can reasonably parse it in multiple ways:

In fact, dash, mksh, and zsh all agree that the result of [ -a -a -a -a ] is 1 when the file -a doesn't exist, not a syntax error! Bash is the odd man out.

I did more testing with the spec test framework:

The shells disagree for rows 3 to 6, which correspond to 4 to 7 occurences of -a. Moreoever, they disagree in different ways for each expression!

(NOTE: OSH doesn't currently implement -a as a unary operator, so it only has ambiguity between -a as a literal and -a as a binary operator.)

POSIX Uses Brute Force

I discovered that if I want OSH to behave like any of the four shells, I couldn't use the same parser for [ and [[.

In fact, resolving the ambiguity means that [ is no longer an expression language. Instead, it's a brute-force enumeration of cases.

The (unmaintained) official Bash FAQ describes it as follows. (You can also look at test.c in the Bash source.)

Bash's builtin test implements the Posix.2 spec, which can be summarized as follows (the wording is due to David Korn):

Here is the set of rules for processing test arguments.

The operators -a and -o are considered binary operators for the purpose of the 3 Arg case.

In theoretical terms, a language is described by a grammar, and a grammar accepts or rejects strings of infinite length. But POSIX apparently specifies no such thing. Only the "unspecified" cases are allowed to use a grammar!

So three cases [ -a ], [ -a -a ], and [ -a -a -a ] are specified by POSIX, which is why four different shells (mostly) agree on their meaning. After that, they wildly diverge, as shown by the spec tests above.

A Style Guideline

At first, I was put off by these hacks. But I noticed that -a and -o are marked obsolete in POSIX, and they're the only constructs that will produce a [ expression longer than four tokens.

And shell already has !, && and || operators, so you can rewrite complex [ expressions like this:

$ path=/etc/passwd
> test -n "$path" && { ! test -L "$path" ||
>                      test "$path" -nt /etc/other; }
> echo $?
0

This leads to a simple style rule:

Do not use anything but the two- or three-argument forms of [.

Good:

Bad:

And remember to quote every substitution.

Conclusion

I described ambiguity in the test / [ builtin, as well as the POSIX rules that shells use to resolve it. These rules work for common cases, but there are problematic corner cases.

Last year, I critiqued the other parts of the shell language in a similar way:

In the next post, I'll describe:

Please leave a comment if anything doesn't make sense.


Appendix A: More Differences between [ and [[

I mentioned these three differences:

More differences:

Appendix B: Why Does OSH Need [ ?

In the spirit of minimalism, I originally thought people could use the coreutils version of of [ with OSH.

But users reported that Gentoo and Nix both invoke [ without $PATH set, which means that /usr/bin/[ won't be found.

So I decided to implement [, which led me down this rat hole!