source | all docs for version 0.7.pre8 | all versions | oilshell.org
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
This doc is written for contributors or users who want to understand the Oil codebase. These internal details are subject to change.
Oil uses regex-based lexers, which are turned into efficient C code with re2c. We intentionally avoid hand-written code that manipulates strings char-by-char, since that strategy is error prone; it's inevitable that rare cases will be mishandled.
The list of lexers can be found by looking at native/fastlex.c
:
echo -e
PS1
backslash escapes.!$
.${x/foo*/replace}
via conversion to ERE. We need
position information, and the fnmatch()
API doesn't provide it, but
regexec()
does.
This section is about extra passes ("irregularities") at parse time. In the "Runtime Issues" section below, we discuss cases that involve parsing after variable expansion, etc.
This makes it harder to produce good error messages with source location info. It also implications for translation, because we break the "arena invariant".
(1) Array L-values like a[x+1]=foo
. bash allows splitting arithmetic
expressions across word boundaries: a[x + 1]=foo
. But I don't see this used,
and it would significantly complicate the OSH parser.
(in _MakeAssignPair
in osh/cmd_parse.py
)
(2) Backticks. There is an extra level of backslash quoting that may
happen compared with $()
.
(in _ReadCommandSubPart
in osh/word_parse.py
)
This isn't necessarily re-parsing, but it's re-reading.
These are handled up front, but not in a single pass.
FOO=bar declare a[x]=1
.
We make another pass with _SplitSimpleCommandPrefix()
.
s=1
doesn't cause reparsing, but a[x+1]=y
does.echo {a,b}
echo ~bob
, home=~bob
func() { echo hi; }
vs. func=() # an array
osh/word_parse.py
calls `lexer.MaybeUnreadOne() to handle right parens in
this case:
(case x in x) ;; esac )
This is sort of like the ungetc()
I've seen in other shell lexers.
osh/parse_lib.py
and its callers.(1) Alias expansion like alias foo='ls | wc -l'
. Aliases are like
"lexical macros".
(2) Prompt strings. $PS1
and family first undergo \
substitution, and
then the resulting strings are parsed as words, with $
escaped to \$
.
(3) Builtins.
eval
trap
builtin
source
— the filename is formed dynamically, but the code is generally
static.All of the cases above, plus:
(1) Recursive Arithmetic Evaluation:
$ a='1+2'
$ b='a+3'
$ echo $(( b ))
6
This also happens for the operands to [[ x -eq x ]]
.
NOTE that a='$(echo 3)
results in a syntax error. I believe this was due
to the ShellShock mitigation.
(2) The unset
builtin takes an LValue. (not yet implemented in OSH)
$ a=(1 2 3 4)
$ expr='a[1+1]'
$ unset "$expr"
$ argv "${a[@]}"
['1', '2', '4']
(3) printf -v takes an "LValue".
(4) Var refs with ${!x}
takes a "cell". (not yet implemented OSH.
Relied on by bash-completion
, as discovered by Greg Price)
$ a=(1 2 3 4)
$ expr='a[$(echo 2 | tee BAD)]'
$ echo ${!expr}
3
$ cat BAD
2
(5) test -v takes a "cell".
(6) ShellShock (removed from bash): export -f
, all variables were checked for
a certain pattern.
compgen -W
(bash only)complete -F ls_complete_func ls
command_not_found
hook; osh doesn't yetSee the doc on Unicode.
local x=$y
vs. s='x=$y'; local $s
.
build/dev.sh
TODO: Move this
The OSH parser is better than other shell parsers:
$PS2
just works (due to _Peek()
and _Next()
). Other shells use special
annotations in the parser to handle newlines. (TODO: link them)ParseContext()
collects
"trails".Bad: it's a slower! This needs to be fixed.
Where the parser is reused:
eval
builtin. (I'm sure bash does this too.)$PS1
, which may contain substitutions, and hence
arbitrary code. Also $PS{2,4}
.!$
to pick off the last word. (bash
does NOT do this.)$IFS
splitting in osh/split.py
argv
by user-defined delimiters, e.g.
:=
The point of a state machine is to make sure all cases are handled!