Why Sponsor Oils? | blog | oilshell.org

The Five Meanings of #. And What Does ${####} Mean?

2016-10-28 (Last updated 2019-02-06)

This is one of several blog entries that dive into the minutiae of the shell language. But let me explain the bigger picture first.

Table of Contents
Syntax and Semantics Should Correspond
An Operator, Variable Name, or a String?
Conclusion

Syntax and Semantics Should Correspond

When I worked on dev tools at Google, I was surprised one day to hear Guido van Rossum being dismissive of Lisp. He said something like, I believe in syntax — meaning that Lisp is a language with little syntax, and this is unpalatable to him as a programmer.

I found this odd, since I was sold on the beauty of Lisp. But in the many years since, I've come to agree that syntax matters. It helps you produce good programs.

(You could also say that, empirically, "meaningful" syntax matters more than homoiconicity.)

Good syntax can shunt some of the cognitive load of programming onto a different part of your brain. But this requires at least a rough correspondence between syntax and semantics. In other words, constructs with wildly different meanings should not share the same the syntax (e.g. static in C and C++).

In Lisp, everything looks the same, so syntax provides you with few hints about the semantics. Clojure took a step toward addressing this by introducing meaning for [] and {}.

The Unix shell is the opposite: everything looks different. It's baroque rather than plain, using operators like ## and %% for what would just be functions in other languages.

These two extremes share the same problem: your brain has to work harder to make sense of the program.

An Operator, Variable Name, or a String?

Now let's dive into the example of the # character.

Inside ${}, it can mean one of five things:

  1. A prefix operator: length of a string or array
  2. A variable: the number of args to a script or function
  3. A binary operator #: strip the minimum length glob prefix from a variable
  4. The second character of the binary operator ##: strip the maximal length glob prefix
  5. A literal string (which doesn't have to be quoted)

Now consider the expressions ${##}, ${##'#'}, and ${####}. The first uses meanings 1 and 2; the second uses meanings 2 and 3, while the third uses meanings 2, 4, and 5. Is that clear?

Here are some code snippets to illustrate this.

  1. ${#} means the same thing as $# -- the length of the arguments array.
$ set -- 1 2 3
> echo ${#}
3
  1. ${#var} is the length of a string variable, and ${#array[@]} is the length of an array.
$ var=ab
> echo ${#var}
2
$ array=(x y z)
> echo ${#array[@]}
3

These first two meanings are related, but unfortunately they compose with expressions like ${#@} and ${##}. For my taste, this too much punctuation.

  1. and 4. Here's how you strip the the shortest or longest match for a glob prefix:
$ foo=aabbcc
> echo ${foo#a*b}
> echo ${foo##a*b}
bcc
cc
  1. The # character can also be a literal string argument.
$ echo ${undef:-#}
#

Unfortunately it can be combined with the operators (meanings #3 and #4):

$ var='####'
> echo ${var###}
###

Because we only stripped one #, the operator must be ## and not # (meaning 4). What we want the # operator? Quoting works:

$ var='####'
> echo ${var#'##'}
##

You might have a headache at this point. Certainly I've already gotten sick of reading the syntax in this blog post :)

Let's invent a syntax to distinguish the 5 meanings:

  1. LengthOf: prefix operator
  2. $len -- the number of args to a script or function
  3. stripShortPrefix
  4. stripLongPrefix
  5. '#' -- a literal string

Using this syntax:

That is:

$ set -- $(seq 25)  # 25 args
$ echo ${#}
25
$ echo ${##}
2
$ echo ${##'#'}
25
$ echo ${####}
25

I find this unintuitive and arbitrary. Though bash, dash, mksh, and zsh all behave this way, it isn't hard to come up with related cases involving # that they disagree on.

This also this implies a more lookahead or backtracking than I implied in the last post, but let's leave that for another time.

Conclusion

I've shown an example where syntax doesn't match semantics in shell. The # token is overloaded for too many things inside ${}.

Never mind that it's used for yet another unrelated purpose inside the arithmetic language, e.g. $(( 16#ff ))!

I'll show more examples of cryptic shell syntax in upcoming posts. In the more distant future I'll show how the Oil language improves upon this state of affairs.