Why Sponsor Oils? | blog | oilshell.org
This post describes two new syntaxes that make Oil programs easier to read and write. Let me know what you think in the comments!
...
PrefixIn Proposed Changes to Oil's Syntax (November 2020), I mentioned this problem with shell:
cat file.txt \
| sort \ # I can't put a comment here
| cut -f 1 \
# And I can't put one here
| grep foo
That is, documenting long commands is hard because you can't mix \
line
continuations and comments. I just released Oil
0.9.2, which solves this problem:
... cat file.txt
| sort # Comment to the right is valid
| cut -f 1
# Comment on its own line is valid
| grep foo
; # Explicit terminator required
In the multiline context started by the ...
prefix:
;
terminator won't cause multi-line mode to "bleed" into the next
command.The appendix describes how this is implemented.
I've tagged this post #real-problems, since this mechanism solves a problem that multiple shell users have encountered. For example, see this January Reddit thread on Shell Scripts Are Executable Documentation.
"""
and '''
and $'''
In June's post Recent Progress on the Oil Language, I wrote that Oil has Python-like multi-line string literals, but enhanced like the Julia language.
Here are examples from the Oil Language Tour.
Double-quoted multi-line strings allow interpolation with $
:
sort <<< """
var sub: $x
command sub: $(echo hi)
expression sub: $[x + 3]
"""
# =>
# command sub: hi
# expression sub: 9
# var sub: 6
In single-quoted multi-line strings, every character is literal, including $
:
sort <<< '''
$2.00 # literal $, no interpolation
$1.99
'''
# =>
# $1.99
# $2.00
C-style multi-line strings interpret character escapes:
sort <<< $'''
C\tD
A\tB
'''
# =>
# A B
# C D
(This section is long and relies on shell expertise. If you only care about using Oil, as opposed to understanding the design, feel free to skip it.)
These string literals are better than shell's here doc syntax in three ways:
(1) Leading whitespace is stripped in a more useful way.
'''
to figure out what
whitespace to strip. If you don't want this, then don't indent the
closing quote. (This rule is similar but not identical to Julia's
rule.)<<-EOF
syntax (as opposed to <<EOF
),
which strips leading tabs, but not spaces.(2) Multi-line strings are consistent with regular strings with respect to
$var
interpolation and character escapes like \n
.
"hello $name"
, 'single'
, and $'\n'
, and it means the same thing.EOF
allows $var
interpolation, but here docs with quoted delimiters like
'EOF'
or \EOF
don't interpret $var
.\n
(at least
not statically-parsed ones). That is, design isn't orthogonal.(3) Multi-line strings can be used in either commands or redirects.
In contrast, here docs can't be used directly with commands like echo
, and
the alternative causes too much I/O.
To elaborate, recall that this use of the <<<
"here string" operator works in
bash and OSH:
$ tr a-z A-Z <<< 'hello'
HELLO
And remember that the sort
examples above used the <<<
operator and not the
<<
"here doc" operator. This is because Oil's multi-line strings are
actually string literals!
Another consequence of this is that you can use a multi-line string directly
in a command, as part of argv
:
echo '''
one
two
three
'''
# =>
# one
# two
# three
In shell, regular strings can span multiple lines, but there's no way to strip leading whitespace, which makes code hard to read:
echo 'one
two
three'
# =>
# one
# two
# three
You could use a here doc and cat
:
# This does too much I/O for a simple task
cat <<EOF
one
two
three
EOF
For such a simple task, this is inefficient in two ways:
cat
rather using the echo
builtin.To recap, I like this design because it's more orthogonal in at least 3 dimensions:
$var
and $\n
are respectedAlso note:
bin/oil
, but not bin/osh
.
(You can also explicitly set shopt --set parse_triple_quote
in bin/osh
).EOF
is useful.However, Oil's string literal syntax still has a "wart": you can't put
(statically-parsed) character escapes like \n
in double quoted strings.
Unfortunately, this is not orthogonal design. (We even document the warts for you; most languages don't.)
I've lived with this for awhile and think it's OK. I believe it's important to keep not just the Oil language small, but also the combined OSH+Oil "surface area". In other words, I'm happy with 6 kinds of string literal (3 x 2 for the multiline variants), but I would not like 8, 10, or 12 kinds.
As always, I welcome contributions in this direction. However I'd also suggest that this isn't the issue to start with — it's one of the most difficult design issues.
This ugly example combines multi-line commands and multi-line strings, and gives our parsing algorithms a workout! There's no reason for this in production code, but it illustrates the principle.
var x = 'one'
# print 3 args without separators
... write --sep '' --end '' --
"""
$x
""" # 1. Double Quoted Multi-Line String
'''
two
three
''' # 2. Single Quoted Multi-Line String
$'four\n' # 3. C-style string with explicit newline
| tac # Reverse
| tr a-z A-Z # Uppercase
;
# =>
# FOUR
# THREE
# TWO
# ONE
###
I also described Oil's doc comment feature in November of last year:
The line below a proc
can have a special ###
comment, and its value can be retrieved with pp proc
.
proc restart(pid) {
### Restart server by sending it a signal
kill $pid
}
A Tour of the Oil Language describes both of these features, and it was discussed on Hacker News a few days ago.
A few familiar questions about the project came up, so I drafted Blog Backlog: FAQ, Project Review, and the Future.
But I might just cut to the chase with What To Expect From Oil in the Near Future.
Try this feature out and tell me if there are any bugs! That is the main purpose of these blog posts.
Oil version 0.9.2 - Source tarballs and documentation.
These notes for are contributors and people who want to reimplement the Oil language. I used our style of #parsing-shell to implement the subtle multi-line command syntax. It falls slightly outside what you'll see in textbooks on parsing.
First, here's an unusual fact: Oil has two levels of tokenization due to the inherent structure of the shell language.
Lexer
outputs Token
objects, and the WordParser
consumes them.
--flag="val $x"
is a word consisting of multiple tokens.WordParser
outputs word_t
objects (compound_word
or Token
), and
the CommandParser
consumes them.
mycommand --flag="val $x" > out.txt
is a command consisting
of multiple words.To parse multi-line commands, we look for the ...
prefix word at the start of
an AndOr
production in the shell grammar. This production handles chains
like cd / && ls | wc -l && echo OK
.
If we see ...
, then we use a Python context manager to flip a flag on the
WordParser
to enter multi-line mode. When it's in this mode, it treats
newlines and blank lines differently. (Python context managers are translated
to C++ constructors and destructors by mycpp).
Because ...
is a unusual command prefix, I don't expect this to break
existing shell code. So multi-line commands are valid in both bin/osh
and
bin/oil
.
(Productivity note: I search the code for symbols like WordParser
with grep $name */*.py
.)
On the other hand, '''foo'''
already has a meaning in shell. It's three
string literals side by side using implicit concatenation.
''
'foo'
''
We take advantage of this to parse multi-line string literals when shopt --set parse_triple_quote
is on. That is, we do not have tokens for '''
,
"""
, and $'''
. Instead, we actually look for an empty string at the start
of a word, then switch into another WordParser
mode, and strip whitespace
when we're done.
This is unusual, but it means that OSH and Oil share the same command and word lexer modes. This is a desirable property for keeping the upgrade path from OSH to Oil smooth, and I think it will make syntax highlighters and other tools easier to write.
This post described two syntax features, which happen to be the first two in Proposed Changes to Oil's Syntax (November 2020).
What about the others?
${x|html}
and ${x %.2f}
for string formatting. I think these are
important, and I just barely started implementing them.$/ d+ /
for inline Egg expressions. I think we can do
this now that we have shopt --set strict_dollar
, which disallows echo $/
because it's equivalent to echo \$/
and echo '$/'
. That is, we don't
need another parsing option.shopt --set parse_amp
for redirects. This is deferred. I believe it's
OK to "memorize" a few idioms for
redirects, and again I want
to keep the combined OSH+Oil surface area small.