Why Sponsor Oil? | source | all docs for version 0.14.2 | all versions | oilshell.org
This document describes Oil's word evaluation semantics (shopt -s simple_word_eval
) for experienced shell users. It may also be useful to
those who want to implement this behavior in another shell.
The main idea is that Oil behaves like a traditional programming language:
That is, parsing and evaluation aren't interleaved, and code and data aren't confused.
In Oil, "word expressions" like
$x
"hello $name"
$(hostname)
'abc'$x${y:-${z//pat/replace}}"$(echo hi)$((a[i] * 3))"
are parsed and evaluated in a straightforward way, like this expression when x == 2
:
1 + x / 2 + x * 3 → 8 # Python, JS, Ruby, etc. work this way
In contrast, in shell, words are "expanded" in multiple stages, like this:
1 + "x / 2 + \"x * 3\"" → 8 # Hypothetical, confusing language
That is, it would be odd if Python looked inside a program's strings for
expressions to evaluate, but that's exactly what shell does! There are
multiple places where there's a silent eval
, and you need quoting to
inhibit it. Neglecting this can cause security problems due to confusing code
and data (links below).
In other words, the defaults are wrong. Programmers are surprised by shell's behavior, and it leads to incorrect programs.
So in Oil, you can opt out of the multiple "word expansion" stages described in the POSIX shell spec. Instead, there's only one stage: evaluation.
The new semantics should be easily adoptable by existing shell scripts.
bin/osh
is POSIX-compatible and runs real bash
scripts. You can gradually opt into stricter and saner behavior with
shopt
options (or by running bin/oil
). The most important one is
simple_word_eval, and the others are listed below.echo @foo
is not too common, and it can be made bash-compatible by quoting it: echo '@foo'
.In the following examples, the argv command prints the argv
array it
receives in a readable format:
$ argv one "two three"
['one', 'two three']
I also use Oil's var keyword for assignments. (TODO: This could be rewritten with shell assignment for the benefit of shell implementers)
In Oil, the following constructs always evaluate to one argument:
$x
, ${y}
$(echo hi)
or backticks$(( 1 + 2 ))
That is, quotes aren't necessary to avoid:
$IFS
.x=''; ls $x
passes ls
no arguments.Here's an example showing that each construct evaluates to one arg in Oil:
oil$ var pic = 'my pic.jpg' # filename with spaces
oil$ var empty = ''
oil$ var pat = '*.py' # pattern stored in a string
oil$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
In contrast, shell applies splitting, globbing, and empty elision after the substitutions. Each of these operations returns an indeterminate number of strings:
sh$ pic='my pic.jpg' # filename with spaces
sh$ empty=
sh$ pat='*.py' # pattern stored in a string
sh$ argv ${pic} $empty $pat $(cat foo.txt) $((1 + 2))
['my', 'pic.jpg', 'a.py', 'b.py', 'contents', 'of', 'foo.txt', '3']
To get the desired behavior, you have to use double quotes:
sh$ argv "${pic}" "$empty" "$pat", "$(cat foo.txt)" "$((1 + 2))"
['my pic.jpg', '', '*.py', 'contents of foo.txt', '3']
The constructs in the last section evaluate to a single argument. In contrast, these three constructs evaluate to 0 to N arguments:
"$@"
and "${myarray[@]}"
echo *.py
. Globs are static when they occur in the
program text.{alice,bob}@example.com
In Oil, shopt -s parse_at
enables these shortcuts for splicing:
@myarray
for "${myarray[@]}"
@ARGV
for "$@"
Example:
oil$ var myarray = %('a b' c) # array with 2 elements
oil$ set -- 'd e' f # 2 arguments
oil$ argv @myarray @ARGV *.py {ian,jack}@sh.com
['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
is just like:
bash$ myarray=('a b' c)
bash$ set -- 'd e' f
bash$ argv "${myarray[@]}" "$@" *.py {ian,jack}@sh.com
['a b', 'c', 'd e', 'f', 'g.py', 'h.py', 'ian@sh.com', 'jack@sh.com']
Unchanged: quotes disable globbing and brace expansion:
$ echo *.py
foo.py bar.py
$ echo "*.py" # globbing disabled with quotes
*.py
$ echo {spam,eggs}.sh
spam.sh eggs.sh
$ echo "{spam,eggs}.sh" # brace expansion disabled with quotes
{spam,eggs}.sh
These rules apply when a sequence of words is being evaluated, exactly as in shell:
echo $x foo
for i in $x foo; do ...
a=($x foo)
and var a = %($x foo)
(oil-array)Shell has other word evaluation contexts like:
sh$ x="${not_array[@]}"
sh$ echo hi > "${not_array[@]}"
which aren't affected by simple_word_eval.
Oil can express everything that shell can.
@split(mystr, IFS?)
@glob(mypat)
@maybe(s)
shopt
Options-
aren't returned. This avoids confusing flags and
files.Strict options cause fatal errors:
This is an intentional incompatibility described in the Known Differences doc.
Oil word evaluation is enabled with shopt -s simple_word_eval
, and proceeds
in a single step.
Variable, command, and arithmetic substitutions predictably evaluate to a single argument, regardless of whether they're empty or have spaces. There's no implicit splitting, globbing, or elision of empty words.
You can opt into those behaviors with explicit expressions like
@split(mystr)
, which evaluates to an array.
Oil also supports shell features that evaluate to 0 to N arguments: splicing, globbing, and brace expansion.
There are other options that "clean up" word evaluation. All options are designed to be gradually adopted by other shells, shell scripts, and eventually POSIX.
-n
This gives insight into how Oil parses shell:
$ osh -n -c 'echo ${x:-default}$(( 1 + 2 ))'
(C {<echo>}
{
(braced_var_sub
token: <Id.VSub_Name x>
suffix_op: (suffix_op.Unary op_id:Id.VTest_ColonHyphen arg_word:{<default>})
)
(word_part.ArithSub
anode:
(arith_expr.Binary
op_id: Id.Arith_Plus
left: (arith_expr.ArithWord w:{<Id.Lit_Digits 1>})
right: (arith_expr.ArithWord w:{<Id.Lit_Digits 2>})
)
)
}
)
You can pass --ast-format text
for more details.
Evaluation of the syntax tree is a single step.