Why Sponsor Oils? | blog | oilshell.org
This is part one of "The Interactive Shell Needs a Principled Parser", which was mentioned in the January blog roadmap.
Last February, I described the interactive features in Oil. I also wrote How to Parse Shell Like a Programming Language to review how the parser works.
Implementing these features taught me that:
This post starts by showing three bugs in the bash ecosystem, which Oil avoids. I discuss another design issue with autocompletion. And the next post will discuss more coupling between the shell parser and the interactive shell.
Let's start with a bug in history expansion, because it's simple to see. I've reproduced this with the newest version of bash:
bash-5.0$ echo ${x:-a b c}
a b c
bash-5.0$ echo !$ # !$ is supposed to be the "last word"
echo c} # It splits the word incorrectly!
c}
Oil uses its own parser, so it isn't fooled by the spaces:
osh$ echo ${x:-a b c}
a b c
osh$ echo !$
! echo ${x:-a b c}
a b c
So the bash history mechanism uses a partial, incorrect parser for its own language. Another part of bash obviously knows how to parse words correctly, but it's not used here.
Bash will complete variable names and command names if you press TAB
:
bash$ echo $HOM<TAB> # (1) completes $HOME
bash$ ech<TAB> # (2) completes echo
However, it doesn't perform these similar completions:
bash$ echo ${undef:-$HOM<TAB> # (3) does NOT complete $HOME
bash$ if true; then ech<TAB> # (4) does NOT complete echo
# ditto for commands in while loops,
# for, case, etc.
This gave me a hint that bash doesn't use its own parser for completion,
which I verified by reading the source. Scanning the ~4300-line bashline.c
file may give you a sense of how it works — e.g. starting at
bash_forward_shellword
. I call this style "groveling through backslashes and
braces one-by-one".
In short, bash's duplicate, ad hoc parser for completion isn't accurate.
In Oil, lines 3 and 4 behave like lines 1 and 2, and there are no special cases involved. We can use the same parser for execution and autocompletion, because it now emits extra information on incomplete input.
I mentioned this confusion over completion in Dev Log #9, but I've since discovered more gory details.
bash-completion is a collection of autocompletion scripts for bash. Linux distros like Debian use it by default.
Because bash's second ad hoc parser isn't accurate, bash-completion makes a third attempt, but it also does poorly.
It tries to parse bash in bash!
Here's an example on Ubuntu 16.04:
andy@host:~$ bash --norc # start with a clean state
bash-4.3$ source /usr/share/bash-completion/bash_completion
bash-4.3$ echo $(readlink <TAB>bash: unexpected EOF while looking for matching `)'
bash: syntax error: unexpected end of file
It gives you a syntax error rather than completion candidates. (I've also reproduced this bug on a Debian machine with bash 4.4.)
In contrast, OSH does what you'd expect:
osh$ echo $(readlink <TAB>
baylisa/ demo/ logs/
Briefly, Oil's parser has two outputs:
ParseError
exception.zsh
does wellI tested zsh, and it behaves like Oil on all of three of these examples. It also has an interesting feature where it prints the parse state in the prompt:
zsh% if
if> true
if> then
then> echo hi
then> fi
hi
zsh%
If you know how zsh or another shell solves these problems, please leave a
comment. What confuses me is that zsh does not
statically parse ${x:-}
or $((1+2))
. So I believe
it must have a second, more accurate parser for completion?
Another thing I learned from implementing autocompletion is that bash's
completion API has a fundamental confusion. It treats the command line as
a string and does ad-hoc splitting into a COMP_WORDS
array.
This means that it conflates two separate problems:
$HOM
or ${HOM
grep --color=
may come never
or always
.Why should these problems be treated separately?
One reason is that you'll have quoting and de-quoting bugs otherwise. From the
perspective of grep
, arguments like 'file with spaces.txt'
and file\ with\ spaces.txt'
are identical. From the perspective of the line editor (the shell
UI), they're different.
A second reason is that it makes it impossible to create shell-agnostic
autocompletion. For example, in Elvish, the syntax for
environment
variables is
$E:USER
, but it's $USER
in POSIX shell. But the syntax of grep
is
obviously the same when used from either shell.
Although bash itself is confused, the bash-completion project "papers
over" this and attempts to separate the shell language from the argv
language. It's not perfect, but it succeeds enough that Oil can reuse most of
the project's code. I've forked oilshell/bash-completion and
run it in my interactive Oil sessions.
Aside: As part of cutting scope in 2020, I'm deferring
work on the Shellac Protocol for shell-agnostic completion. The goal of
this project is to help upstream authors target something other than bash's
flawed API. But I'd still like to see it happen, and there are shell authors
interested in it on #shell-autocompletion
on oilshell.zulipchat.com.
I've used bash on Ubuntu for about 15 years now, but I only recently became aware of these obvious bugs.
This is puzzling, but Operant Conditioning by Software Bugs is the most likely explanation. That is, users subconsciously train themselves to avoid bugs.
I can think of another example in the domain of GUIs and Windows. I avoid moving the mouse at certain times when the computer appears "busy". Subconsciously, I fear it may crash.
I was probably trained to do this 20 years ago by Windows 95. I don't know if such crash bugs exist in Ubuntu, but the subtle avoidance is still there.
If you've been using bash for many years, you may be unknowingly avoiding
these bugs! To me, hitting TAB
seems to carry a "downside risk", which I hope
to avoid in Oil. You should be able to hit TAB
all the time and bad things
won't happen.
This post showed the benefits of reusing a shell's parser for interactive features like history and autocompletion. I was a stickler about parsing back in 2016, but I didn't realize it would pay off when implementing the interactive shell in 2018.
The next post will discuss how parsing interacts with the interactive prompt
and alias
expansion — ideally, it shouldn't.
There are undoubtedly areas where Oil is less polished than bash. But I'm optimistic that the principled architecture will hold up after those fixes. I've fixed dozens of user-reported bugs in other areas of Oil, and they've largely fallen into the "right" place.
I believe that top-down parsing makes it easier to generate the "trail" for incomplete input mentioned above. This is in contrast to bash's use of bottom-up LR parsing via yacc.
The AOSA chapter on bash seems to support this:
The bash parser is derived from an early version of the Posix grammar, and is, as far as I know, the only Bourne-style shell parser implemented using Yacc or Bison. This has presented its own set of difficulties — the shell grammar isn't really well-suited to yacc-style parsing and requires some complicated lexical analysis and a lot of cooperation between the parser and the lexical analyzer.
...
One thing I've considered multiple times, but never done, is rewriting the bash parser using straight recursive-descent rather than using bison.