Why Sponsor Oils? | blog | oilshell.org
This is one of several blog entries that dive into the minutiae of the shell language. But let me explain the bigger picture first.
When I worked on dev tools at Google, I was surprised one day to hear Guido van Rossum being dismissive of Lisp. He said something like, I believe in syntax — meaning that Lisp is a language with little syntax, and this is unpalatable to him as a programmer.
I found this odd, since I was sold on the beauty of Lisp. But in the many years since, I've come to agree that syntax matters. It helps you produce good programs.
(You could also say that, empirically, "meaningful" syntax matters more than homoiconicity.)
Good syntax can shunt some of the cognitive load of programming onto a
different part of your brain. But this requires at least a rough
correspondence between syntax and semantics. In other words, constructs
with wildly different meanings should not share the same the syntax (e.g.
static
in C and C++).
In Lisp, everything looks the same, so syntax provides you with few hints about
the semantics. Clojure took a step toward addressing this by
introducing meaning for []
and {}
.
The Unix shell is the opposite: everything looks different. It's baroque
rather than plain, using operators like ##
and %%
for what would just be
functions in other languages.
These two extremes share the same problem: your brain has to work harder to make sense of the program.
Now let's dive into the example of the #
character.
Inside ${}
, it can mean one of five things:
#
: strip the minimum length glob prefix from a variable##
: strip the maximal length
glob prefixNow consider the expressions ${##}
, ${##'#'}
, and ${####}
. The first
uses meanings 1 and 2; the second uses meanings 2 and 3, while the third uses
meanings 2, 4, and 5. Is that clear?
Here are some code snippets to illustrate this.
${#}
means the same thing as $#
-- the length of the arguments array.$ set -- 1 2 3 > echo ${#} 3
${#var}
is the length of a string variable, and ${#array[@]}
is the
length of an array.$ var=ab > echo ${#var} 2
$ array=(x y z) > echo ${#array[@]} 3
These first two meanings are related, but unfortunately they compose with
expressions like ${#@}
and ${##}
. For my taste, this too much punctuation.
$ foo=aabbcc > echo ${foo#a*b} > echo ${foo##a*b} bcc cc
#
character can also be a literal string argument.$ echo ${undef:-#} #
Unfortunately it can be combined with the operators (meanings #3 and #4):
$ var='####' > echo ${var###} ###
Because we only stripped one #
, the operator must be ##
and not #
(meaning 4). What we want the #
operator? Quoting works:
$ var='####' > echo ${var#'##'} ##
You might have a headache at this point. Certainly I've already gotten sick of reading the syntax in this blog post :)
Let's invent a syntax to distinguish the 5 meanings:
LengthOf
: prefix operator$len
-- the number of args to a script or functionstripShortPrefix
stripLongPrefix
'#'
-- a literal stringUsing this syntax:
${##}
is ${LengthOf $len}
${##'#'}
is ${$len stripShortPrefix '#'}
${####}
is ${$len stripLongPrefix '#'}
That is:
$ set -- $(seq 25) # 25 args $ echo ${#} 25 $ echo ${##} 2 $ echo ${##'#'} 25 $ echo ${####} 25
I find this unintuitive and arbitrary. Though bash, dash, mksh,
and zsh all behave this way, it isn't hard to come up with related cases
involving #
that they disagree on.
This also this implies a more lookahead or backtracking than I implied in the last post, but let's leave that for another time.
I've shown an example where syntax doesn't match semantics in shell. The #
token is overloaded for too many things inside ${}
.
Never mind that it's used for yet another unrelated purpose inside the
arithmetic language, e.g. $(( 16#ff ))
!
I'll show more examples of cryptic shell syntax in upcoming posts. In the more distant future I'll show how the Oil language improves upon this state of affairs.