Why Sponsor Oils? | blog | oilshell.org
This is part two of "The Interactive Shell Needs a Principled Parser", which was mentioned in the January blog roadmap.
The last post argued that a shell should use its parser for history and autocompletion.
This post is something of the opposite: a shell parser shouldn't be concerned with alias expansion or the interactive prompt. Those are orthogonal concerns.
I show code "smells" rather than specific bugs, so the argument is more abstract. But I believe it's an important issue, especially when you want to expand the shell language. You'll also see that the author of the rc shell had essentially the same criticism back in 1991!
In a POSIX shell,
$PS1
is the prompt for the first line of input, like $
$PS2
is the prompt for continuation lines, like >
What I call the "$PS2
problem" is simply: When the user hits Enter
, does
the shell execute the line of text, or does it print $PS2
and wait for more
input?
$ echo hi # Enter causes command to be executed
hi
$ if echo hi # prints > and prompts for more input
> then # more input needed after 'then'
>
Oil handles this problem outside the lexer and parser, in the
InteractiveLineReader
.
In contrast, all the shells I've looked at litter their parsers with references to the prompt. See the appendix for evidence of that in dash.
Over the last few years of implementing shell, I've found many times that a careful reading of the POSIX spec isn't sufficient.
Instead, I use two main techniques to determine the required behavior:
When implementing alias
, I was surprised that shells implemented it by
littering their parsers with reads and writes of global variables. Again,
see the appendix for evidence of this in dash.
Does this matter? Let me make a more abstract argument first.
In my BayLISA presentation last year, I quoted a few 25-year-old complaints about shell, including this one:
... nobody really knows what the Bourne shell’s grammar is. Even examination of the source code is little help. The parser is implemented by recursive descent, but the routines corresponding to the syntactic categories all have a flag argument that subtly changes their operation depending on the context.
— Tom Duff in a paper on Plan 9's rc shell, 1991
Duff is lamenting the "flag arguments", but global variables are strictly worse.
If you agree that this was a problem 25 years ago, it's even more of a problem today. The POSIX spec was silent on such issues then, and it is now. Since then, shells have grown more ad hoc features.
Oil's behavior diverges slightly from other shells, but it's designed to be
documentable. Global variables are extraneous state outside the
grammar, and you don't need them to describe what Oil does. To implement
alias
expansion, it re-invokes the parser as a library, rather than
changing global flags.
To be more concrete, let's look at the results of running spec/alias.test.sh. Here are the cases that shells disagree on:
So out of four shells (including mksh), none of them agree on all alias
test cases. To be honest, this isn't worse than any other shell feature.
Given all the global flags, I was surprised at the relative agreement!
Nonetheless I still prefer Oil's strict style, because it makes it easier to expand the language without thinking about prompts or aliases.
As mentioned in the last post, there are still more things to implement in Oil, and there are undoubtedly cases where it behaves worse than existing shells.
Help me polish it by testing it interactively and on real shell scripts. See Help Wanted and Where To Send Feedback.
This post explained that Oil's parser is not concerned with these interactive features:
$PS1
vs. $PS2
)On the other hand, the last post showed that the parser can used as a library to implement:
Following the January blog roadmap, the next post will clarify my goals for the reduced Oil language.
dash
Code ExcerptsDash has a ~1500 line recursive-descent parser, and it deals with the prompt throughout. Other shells are implemented similarly.
In Oil, this knowledge is confined to the InteractiveLineReader
.
~/dash-0.5.8/src$ grep -n prompt parser.c
86:int doprompt; /* if set, prompt the user */
87:int needprompt; /* true if interactive and at start of line */
112:STATIC void setprompt(int);
141: doprompt = interact;
142: if (doprompt)
143: setprompt(doprompt);
144: needprompt = 0;
662: if (needprompt) {
663: setprompt(2);
774: if (needprompt) {
775: setprompt(2);
790: if (doprompt)
791: setprompt(2);
798: needprompt = doprompt;
881: if (c == '\034' && doprompt
899: if (doprompt)
900: setprompt(2);
920: if (doprompt)
921: setprompt(2);
1078: needprompt = doprompt;
1298: int uninitialized_var(saveprompt);
1318: if (needprompt) {
1319: setprompt(2);
1328: if (doprompt)
1329: setprompt(2);
1352: needprompt = doprompt;
1375: saveprompt = doprompt;
1376: doprompt = 0;
1382: doprompt = saveprompt;
1489:setprompt(int which)
1494: needprompt = 0;
1495: whichprompt = which;
1504: out2str(getprompt(NULL));
1513: int saveprompt;
1518: saveprompt = doprompt;
1519: doprompt = 0;
1523: doprompt = saveprompt;
...
alias
Likewise, the parser has many checks for global flags, including a flag for alias expansion. In contrast, Oil invokes its parser as a library to expand aliases.
~/dash-0.5.8/src$ grep -i -n -C 1 alias parser.c
...
--
160-
161: checkkwd = CHKNL | CHKKWD | CHKALIAS;
162- if (nlflag == 2 && tokendlist[peektoken()])
--
203- }
204: checkkwd = CHKNL | CHKKWD | CHKALIAS;
205- if (tokendlist[peektoken()])
--
241- }
242: checkkwd = CHKNL | CHKKWD | CHKALIAS;
243- n2 = pipeline();
--
264- negate = !negate;
265: checkkwd = CHKKWD | CHKALIAS;
266- } else
--
278- lp = (struct nodelist *)stalloc(sizeof (struct nodelist));
279: checkkwd = CHKNL | CHKKWD | CHKALIAS;
280- lp->n = command();
--
363- n1->nfor.var = wordtext;
364: checkkwd = CHKNL | CHKKWD | CHKALIAS;
365- if (readtoken() == TIN) {
--
392- }
393: checkkwd = CHKNL | CHKKWD | CHKALIAS;
394- if (readtoken() != TDO)
--
409- n2->narg.next = NULL;
410: checkkwd = CHKNL | CHKKWD | CHKALIAS;
411- if (readtoken() != TIN)
--
472- /* Now check for redirection which may follow command */
473: checkkwd = CHKKWD | CHKALIAS;
474- rpp = rpp2;
--
512-
513: savecheckkwd = CHKALIAS;
514- savelinno = plinno;
--
556- n->type = NDEFUN;
557: checkkwd = CHKNL | CHKKWD | CHKALIAS;
558- n->ndefun.text = n->narg.text;
--
725-
726: if (checkkwd & CHKALIAS) {
...
727: struct alias *ap;
728: if ((ap = lookupalias(wordtext, 1)) != NULL) {
...