source | all docs for version 0.8.pre8 | all versions | oilshell.org
This doc has rough notes on the architecture of the parser. How to Parse Shell Like a Programming Language (on the blog) covers some of the same material and is more polished.
Oil uses regex-based lexers, which are turned into efficient C code with re2c. We intentionally avoid hand-written code that manipulates strings char-by-char, since that strategy is error prone; it's inevitable that rare cases will be mishandled.
The list of lexers can be found by looking at native/fastlex.c.
echo -e
PS1
backslash escapes.!$
.${x/foo*/replace}
via conversion to ERE. We need
position information, and the fnmatch()
API doesn't provide it, but
regexec()
does.
These constructs aren't recognized by Oil's front end. Instead, they're punted to libc:
*.py
(in most cases)@(*.py|*.sh)
strftime
format strings, e.g. printf '%(%Y-%m-%d)T' $timestamp
osh/word_parse.py calls lexer.MaybeUnreadOne()
to handle right
parens in this case:
(case x in x) ;; esac )
This is sort of like the ungetc()
I've seen in other shell lexers.
This section is about extra passes ("irregularities") at parse time. In the "Runtime Issues" section below, we discuss cases that involve parsing after variable expansion, etc.
This makes it harder to produce good error messages with source location info. It also implications for translation, because we break the "arena invariant".
(1) Array L-values like a[x+1]=foo
. bash allows splitting arithmetic
expressions across word boundaries: a[x + 1]=foo
. But I don't see this used,
and it would significantly complicate the OSH parser.
(in _MakeAssignPair
in osh/cmd_parse.py
)
(2) Backticks. There is an extra level of backslash quoting that may
happen compared with $()
.
(in _ReadCommandSubPart
in osh/word_parse.py
)
VirtualLineReader
)These are handled up front, but not in a single pass.
FOO=bar declare a[x]=1
.
We make another pass with _SplitSimpleCommandPrefix()
.
s=1
doesn't cause reparsing, but a[x+1]=y
does.echo {a,b}
echo ~bob
, home=~bob
myfunc() { echo hi; }
vs. myfunc=() # an array
shopt -s parse_equals
: For x = 1 + 2*3
<<-
. The leading tab is lost, because we don't need it for
translation.alias foo='ls | wc -l'
. Aliases are like
"lexical macros".$PS1
and family first undergo \
substitution, and
then the resulting strings are parsed as words, with $
escaped to \$
.eval
trap
builtin
source
— the filename is formed dynamically, but the code is generally
static.All of the cases above, plus:
(1) Recursive Arithmetic Evaluation:
$ a='1+2'
$ b='a+3'
$ echo $(( b ))
6
This also happens for the operands to [[ x -eq x ]]
.
Note that a='$(echo 3)'
results in a syntax error. I believe this was
due to the ShellShock mitigation.
(2) The unset
builtin takes an LValue. (not yet implemented in OSH)
$ a=(1 2 3 4)
$ expr='a[1+1]'
$ unset "$expr"
$ argv "${a[@]}"
['1', '2', '4']
(3) printf -v takes an "LValue".
(4) Var refs with ${!x}
takes a "cell". (not yet implemented OSH.
Relied on by bash-completion
, as discovered by Greg Price)
$ a=(1 2 3 4)
$ expr='a[$(echo 2 | tee BAD)]'
$ echo ${!expr}
3
$ cat BAD
2
(5) test -v takes a "cell".
(6) ShellShock (removed from bash): export -f
, all variables were checked for
a certain pattern.
test
/ [
, e.g. [ -a -a -a ]