Why Sponsor Oils? | source | all docs for version 0.20.0 | all versions | oilshell.org
This doc has rough notes on the architecture of the parser.
How to Parse Shell Like a Programming Language (2019 blog post) covers some of the same material. (As of 2024, it's still pretty accurate, although there have been minor changes.)
The test suite test/lossless.sh invokes osh --tool lossless-cat $file.
The lossless-cat tool does this:
Now, do the tokens "add up" to the original file? That's what we call the lossless invariant.
It will be the foundation for tools that statically understand shell:
--tool ysh-ify - change style of do done → { }, etc.--tool fmt - fix indentation, maybe some line wrappingThe sections on re-parsing explain some obstacles which we had to overcome.
Oils uses regex-based lexers, which are turned into efficient C code with re2c. We intentionally avoid hand-written code that manipulates strings char-by-char, since that strategy is error prone; it's inevitable that rare cases will be mishandled.
The list of lexers can be found by looking at native/fastlex.c.
echo -ePS1 backslash escapes.!$.${x/foo*/replace} via conversion to ERE. We need
position information, and the fnmatch() API doesn't provide it, but
regexec() does.
These constructs aren't recognized by the Oils front end. Instead, they're punted to libc:
*.py (in most cases)@(*.py|*.sh)strftime format strings, e.g. printf '%(%Y-%m-%d)T' $timestamposh/word_parse.py calls lexer.MaybeUnreadOne() to handle right
parens in this case:
(case x in x) ;; esac )
This is sort of like the ungetc() I've seen in other shell lexers.
This section is about extra passes / "irregularities" at parse time. In the "Runtime Issues" section below, we discuss cases that involve parsing after variable expansion, etc.
We try to avoid re-parsing, but it happens in 4 places.
It complicates error messages with source location info. It also implications
for --tool ysh-ify and --tool fmt, because it affects the "lossless invariant".
This command is perhaps a quicker explanation than the text below:
$ grep do_lossless */*.py
...
osh/cmd.py: ...
osh/word_parse.py: ...
Where re-parse:
Here documents: We first read lines, and then parse them.
VirtualLineReader in osh/cmd_parse.pyArray L-values like a[x+1]=foo. bash allows splitting arithmetic
expressions across word boundaries: a[x + 1]=foo. But I don't see this
used, and it would significantly complicate the OSH parser.
_MakeAssignPair in osh/cmd_parse.py has do_lossless conditionBackticks, the legacy form of $(command sub). There's an extra level
of backslash quoting that may happen compared with $(command sub).
_ReadCommandSubPart in osh/word_parse.py has do_lossless
conditionysh-ify or fmt toolsalias expansion
SnipCodeString in osh/cmd_parse.pyalias ls=foo. So it doesn't affect the lossless
invariant that --tool ysh-ify and --tool fmt use.These language constructs are handled statically, but not in a single pass of parsing:
FOO=bar declare a[x]=1.
We make another pass with _SplitSimpleCommandPrefix().
s=1 doesn't cause reparsing, but a[x+1]=y does.echo {a,b}echo ~bob, home=~bobThis is less problematic, since it doesn't affect error messages
(ctx_SourceCode) or the lossless invariant.
myfunc() { echo hi; } vs. myfunc=() # an arrayshopt -s parse_equals: For x = 1 + 2*3alias foo='ls | wc -l'. Aliases are like
"lexical macros".$PS1 and family first undergo \ substitution, and
then the resulting strings are parsed as words, with $ escaped to \$.evaltrap builtin
source — the filename is formed dynamically, but the code is generally
static.All of the cases above, plus:
(1) Recursive Arithmetic Evaluation:
$ a='1+2'
$ b='a+3'
$ echo $(( b ))
6
This also happens for the operands to [[ x -eq x ]].
Note that a='$(echo 3)' results in a syntax error. I believe this was
due to the ShellShock mitigation.
(2) The unset builtin takes an LValue. (not yet implemented in OSH)
$ a=(1 2 3 4)
$ expr='a[1+1]'
$ unset "$expr"
$ argv "${a[@]}"
['1', '2', '4']
(3) printf -v takes an "LValue".
(4) Var refs with ${!x} takes a "cell". (not yet implemented OSH.
Relied on by bash-completion, as discovered by Greg Price)
$ a=(1 2 3 4)
$ expr='a[$(echo 2 | tee BAD)]'
$ echo ${!expr}
3
$ cat BAD
2
(5) test -v takes a "cell".
(6) ShellShock (removed from bash): export -f, all variables were checked for
a certain pattern.
test / [, e.g. [ -a -a -a ]