Why Sponsor Oils? | blog | oilshell.org
Update 10/2019: The Oil language has changed. This post is accurate in spirit but not in detail.
In Success with ASDL, I mentioned that a top project priority is to automatically translate shell programs to the oil language. The ability to express real programs is a test of the language's design, especially when they're written by others.
I've done perhaps 25% of the work, but the translations are starting to look accurate. Language features are apparently used in a Pareto or "long tail" distribution.
In this post, I'll show a translated program and explain the oil language features it uses.
Before looking at code, let's remind ourselves of the motivation. At first glance, this project seems similar to CoffeeScript. We want a better syntax for shell, in order to reveal its powerful semantics, e.g. Bernstein chaining and pipelines.
I believe this is important because syntax matters.
But more important is that shell syntax leaves no room for extension. New
features will necessarily have a tortured syntax, such as the ^
, ^^
, ,
and ,,
operations to change the case of a string in bash. I plan to
justify this further in a post called Declaring Syntax Bankruptcy on Shell.
So the bigger motivation for the oil language is to add features to shell — in particular borrowing some from awk and make, as well providing a dialect for config files. I'm excited about these goals, but they require getting past some tedious work.
I plan to show two files from Aboriginal Linux and two
files from the /etc/init.d
directory on my Ubuntu machine. (Early blog
posts: Aboriginal, init.d.)
I chose these files because they're short, use a variety of language features, and can now be translated automatically. We'll see the first file today, and the remaining three tomorrow.
Open sources/toys/make-hdb.sh in a new window to see:
Make sure to widen the window so that the two code panes appear side-by-side.
Notice that whitespace and comments are intentionally preserved. That is, if
your style is to put then
on its own line, the opening {
in oil will also
be on its own line. I'll describe the algorithm for style-preserving
translation in a future post.
To repeat, the original code is:
make_hdb()
{
# Some distros don't put /sbin:/usr/sbin in the $PATH for non-root users.
if [ -z "$(which mke2fs)" ] || [ -z "$(which tune2fs)" ]
then
export PATH=/sbin:/usr/sbin:$PATH
fi
truncate -s ${HDBMEGS}m "$HDB" &&
mke2fs -q -b 1024 -F "$HDB" -i 4096 &&
tune2fs -j -c 0 -i 0 "$HDB"
[ $? -ne 0 ] && exit 1
}
And here is the oil code, slightly reformatted by hand:
proc make_hdb {
# Some distros don't put /sbin:/usr/sbin in the $PATH for non-root users.
if test -z $[which mke2fs] || test -z $[which tune2fs] {
export PATH = "/sbin:/usr/sbin:$PATH"
}
truncate -s $(HDBMEGS)m $HDB &&
mke2fs -q -b 1024 -F $HDB -i 4096 &&
tune2fs -j -c 0 -i 0 $HDB
test $Status -ne 0 && exit 1
}
They look similar from a distance, which is good. But notice the following changes:
(1) The proc
keyword. Oil will have both "procs" and functions, denoted
with keywords proc
and func
.
Procs are what we call shell "functions": they accept an argv
array of
strings, return an integer status, and have file descriptors. They resemble
both processes and a procedures.
Functions are like those in Python or JavaScript. They have typed arguments and return values.
One important use case for functions is user-defined interactive
completion. Bash has a convention to mutate globals, e.g.
COMPREPLY
, but proper return values are preferable.
Another use case is string manipulation, e.g. to escape HTML or SQL. You can
fake this by writing a "return value" to stdout
and capturing it with a
subshell, but this requires forking for every function call.
So it makes sense to have proper functions, but procs are important too because they're isomorphic to an external process. I'll explain how they work together in a future post.
(2) if
uses curly braces as block delimiters instead of then
and
fi
. Reasons for this:
Consistency: In shell, function bodies are delimited by braces, while other
blocks are delimited by keywords like do
and done
. In oil, all blocks
use braces.
Huffman coding: Block delimiters are common, so they should be short, and braces are shorter than keywords. Python-style indented blocks are even shorter, but aren't suitable for a shell because the language is meant to be typed interactively.
Note that {
is an operator in oil, but confusingly it isn't in shell.
See discussion below.
(3) The conversion uses test
instead of [
. Oil will have C-style infix
boolean expressions, but legacy code may use test
.
Not only is the [
command an ugly syntactic pun, but the [
character is
an operator in oil, so it requires quoting when in a command name.
The fact that [
and {
aren't operators prevents the shell language from
evolving. For example:
$ echo 'echo hi from script with funny name' > ]{
$ chmod +x ]{
$ ./]{
hi from script with funny name
In oil, you would just add single quotes like this: ']{'
.
(4) Special variables look like $Status
rather than $?
. In oil code,
we prefer readable names. A completion system that's configured well by
default will make them easy to type.
The remaining observations require some background. Recall that shell is composed of four mutually recursive sublanguages:
for
, if
, functions, ...${}
, $()
, $(())
, ...a**2
+ b**2
[[ a =~ b ]]
Roughly speaking, shell has a separate expression language for each type: strings, integers, and booleans. Oil does away with this complexity with a single expression language for all types, like C or Python.
As a result, it has just two sublanguages: commands and expressions.
The []
characters are used for arrays, and the ()
characters are used for
grouping expressions, as in most languages. So it makes sense for $[]
to be
command substitution and $()
to be expression substitution. Commands
are simply arrays of strings.
Keeping the two sublanguages in mind, notice:
(5) $(HDBMEGS)
is a delimited variable substitution, in contrast to
${HDBMEGS}
.
(6) $[which mke2fs]
is command substitution, in contrast to $(which mke2fs)
.
Arithmetic substitution will be $(x + 1)
instead of $((x + 1))
. Strings
and integers don't need different substitution syntax.
(7) Substitutions aren't quoted. Oil doesn't split words because it's a misfeature designed to simulate arrays. (Most shell implementations have arrays as an extension, but they're not in POSIX.)
Splitting can be done explicitly with @split(HDB)
or @[which mke2fs]
. The
@
character is associated with arrays, e.g. for splitting and splicing.
(8) In contrast, strings on the right-hand side of assignments must be
quoted. In expression mode, strings must be quoted; and everything to the
right of =
is parsed in expression mode as opposed to command mode. This
will be implemented with the lexer modes technique (formerly
lexical state).
Examples:
echo foo bar # command mode: command and two literal words
foo = bar # expression mode: bar is a variable, as in C or python
foo = 'bar' # bar is a string
x = 1 + 2 * 3 # an integer expression
s = myStr or 'default' # a string expression
Also notice that =
is a proper operator and may have spaces around it.
The seven-line make-hdb.sh script showed us many features of the oil language. It has:
{}
,[]
,=
[
]
{
and }
as proper operators,Tomorrow, I'll reveal more language features with three more automatic translations.