Problems With the test Builtin: What Does -a Mean?

2017-08-31

I recently implemented the test builtin, also known as [. Since I had already implemented its statically-parsed cousin [[, I thought that this task would be straightforward.

But, as always, shell is full of surprises. In this post, I describe fundamental problems with the design of the [ builtin. You can consider this another episode of Shell: The Bad Parts.

To be concrete, what does this expression mean?

$ [ -a -a -a -a ]

Table of Contents

Background

The Prefix Operator -a

Three Expressions and Three Meanings of -a

What Does [ -a -a -a -a ] Mean?

POSIX Uses Brute Force

A Style Guideline

Conclusion

Appendix A: More Differences between [ and [[

Appendix B: Why Does OSH Need [ ?

Background

Recall the difference between [ and [[ from October:

[ Is a Builtin, But [[ Is Part of the Language

A shell builtin has the same interface as an external command: it receives an argv array and returns an exit code. So [ must parse the expression after variables are substituted and quotes are processed. In other words, it does dynamic parsing.

In contrast, [[ is part of the language, so it can "see" quoting on tokens. This means that it can solve the ambiguity problems with [ that I show below.

Here is a statically-parsed [[ expression:

$ path=/etc/passwd
> [[ -n $path && (! -L $path || $path -nt /etc/other) ]]
> echo $?
0

It may be more readable in this C-like syntax:

nonempty(path) && (!isSymlink(path) || newerThan(path, "/etc/other"))

The [ version is almost the same:

$ path=/etc/passwd
> [ -n "$path" -a '(' ! -L "$path" -o "$path" -nt /etc/other ')' ]
> echo $?
0

except:

(1) Each token in the expression must be a separate element of the argv array. This means:

Shell operators like ( must be quoted
( and ! must be separated by a space, so that they become separate argv entries.
"$path" must be quoted so it's not split into multiple tokens. Otherwise, a filename with spaces would cause a syntax error in [.

(2) -o and -a are used for logical or and and. In contrast, [[ can reuses the shell operators || and &&.

Alert readers may already see why the [ language has ambiguous expressions.

The Prefix Operator -a

In bash and ksh, the -a operator is an alias for -e, which returns 0 (true) if and only if its path argument exists:

$ [ -a / ]; echo $?
0

$ [ -a /oops ]; echo $?
1

Three Expressions and Three Meanings of -a

Now I'll show some pathological examples. Although such examples are contrived to be the worst case, I've found them in wild.

Also, these ambiguities lead to a bad class of bug: data-dependent bugs that occur only 0.01% of the time. Bugs like this tend to escape testing.

So, what does -a mean in these 3 expressions?

[ -a ]
[ -a -a ]
[ -a -a -a ]

To decipher them, let's make these definitions:

mystr='-a'
otherstr='-a'
mypath='-a'

Because [ is a builtin, these 3 expressions are identical to the 3 above:

[ "$mystr" ] — test if the string -a is non-empty
[ -a "$mypath" ] — test if the file -a exists
[ "$mystr" -a "$otherstr" ] — test if both -a and -a are non-empty

So -a means 3 different things, depending on the context:

A literal string (which may be a path)
Unary operator: alias for -e, to test if a file exists
Binary Operator: logical and

Not only does this make code hard to read, it also makes it difficult to write a correct parser for [.

Note that [ isn't the only command with this type of problem. The find and expr tools are also expression languages with no lexer, and thus have related ambiguity issues. I may write about them in the future.

Another way to think about it: If Python had no distinction between strings and keywords, you wouldn't be able to tell these two expression apart:

>>> 'and' and 'and'  # A valid expression in Python
'and'
>>> and and and      # SyntaxError

What Does [ -a -a -a -a ] Mean?

In bash, it's a syntax error:

$ [ -a -a -a -a ]
... no output ...

But you can reasonably parse it in multiple ways:

[ -a "$mypath" -a "$mystr" ]— (EXISTS mypath) AND mystr
[ "$mystr" -a -a "$mypath" ]— mystr AND (EXISTS mypath)

In fact, dash, mksh, and zsh all agree that the result of [ -a -a -a -a ] is 1 when the file -a doesn't exist, not a syntax error! Bash is the odd man out.

I did more testing with the spec test framework:

Tests for Different -a Expressions

The shells disagree for rows 3 to 6, which correspond to 4 to 7 occurences of -a. Moreoever, they disagree in different ways for each expression!

(NOTE: OSH doesn't currently implement -a as a unary operator, so it only has ambiguity between -a as a literal and -a as a binary operator.)

POSIX Uses Brute Force

I discovered that if I want OSH to behave like any of the four shells, I couldn't use the same parser for [ and [[.

In fact, resolving the ambiguity means that [ is no longer an expression language. Instead, it's a brute-force enumeration of cases.

The (unmaintained) official Bash FAQ describes it as follows. (You can also look at test.c in the Bash source.)

Bash's builtin test implements the Posix.2 spec, which can be summarized as follows (the wording is due to David Korn):

Here is the set of rules for processing test arguments.

0 Args: False
1 Arg: True iff argument is not null.
2 Args:

If first arg is !, True iff second argument is null.
If first argument is unary, then true if unary test is true
Otherwise error.

3 Args:

If second argument is a binary operator, do binary test of $1 $3
If first argument is !, negate two argument test of $2 $3
If first argument is '(' and third argument is ')', do the one-argument test of the second argument.
Otherwise error.

4 Args:

If first argument is !, negate three argument test of $2 $3 $4.
Otherwise unspecified

5 or more Args: unspecified. (Historical shells would use their current algorithm).

The operators -a and -o are considered binary operators for the purpose of the 3 Arg case.

In theoretical terms, a language is described by a grammar, and a grammar accepts or rejects strings of infinite length. But POSIX apparently specifies no such thing. Only the "unspecified" cases are allowed to use a grammar!

So three cases [ -a ], [ -a -a ], and [ -a -a -a ] are specified by POSIX, which is why four different shells (mostly) agree on their meaning. After that, they wildly diverge, as shown by the spec tests above.

A Style Guideline

At first, I was put off by these hacks. But I noticed that -a and -o are marked obsolete in POSIX, and they're the only constructs that will produce a [ expression longer than four tokens.

And shell already has !, && and || operators, so you can rewrite complex [ expressions like this:

$ path=/etc/passwd
> test -n "$path" && { ! test -L "$path" ||
>                      test "$path" -nt /etc/other; }
> echo $?
0

This leads to a simple style rule:

Do not use anything but the two- or three-argument forms of [.

Good:

[ -z STR ] — 2 args
[ PATH1 -nt PATH2 ] — 3 args
! [ -d PATH ] — 2 args with negation on the outside
[ -d PATH1 ] && [ -d PATH2 ]
- test -d PATH1 && test -d PATH2 — same thing, but I think it looks nicer

Bad:

[ ! -d PATH ] — use shell's negation instead of negation within [
[ -d PATH1 -a -d $PATH2 ] — use shell's && instead of -a
[ STR ] — this is technically OK, but redundant with [ -n STR ].
[ -a PATH ] — use [ -e PATH ] instead

And remember to quote every substitution.

Conclusion

I described ambiguity in the test / [ builtin, as well as the POSIX rules that shells use to resolve it. These rules work for common cases, but there are problematic corner cases.

Last year, I critiqued the other parts of the shell language in a similar way:

In the next post, I'll describe:

Boolean expressions in Oil.
How I plan to automatically convert [ and [[ to Oil, in the style of Translating Shell to Oil.

Please leave a comment if anything doesn't make sense.

Appendix A: More Differences between [ and [[

I mentioned these three differences:

[[ uses a grammar; [ uses a POSIX parsing rule with six cases for fixed lengths.
Static vs. Dynamic parsing
- No word splitting is applied in [[, so no need to quote $varsubs
- [ can't tell the difference between quoted and unquoted strings, but [[ can. [[ $foo == *.py ]] is different than [[ $foo == '*.py' ]].
- && and || vs -a and -o

More differences:

$foo == *.py oddly does glob matching in [[, but not in [
The [[ language has =~ for regular expressions, but there is no [ equivalent. The external expr tool has similar regex functionality for POSIX-compliant scripts.

Appendix B: Why Does OSH Need `[` ?

In the spirit of minimalism, I originally thought people could use the coreutils version of of [ with OSH.

But users reported that Gentoo and Nix both invoke [ without $PATH set, which means that /usr/bin/[ won't be found.

So I decided to implement [, which led me down this rat hole!