Why Sponsor Oils? | blog | oilshell.org
Update: I issued a correction on 2017/11/28. OSH no longer uses the algorithm described in this post, but the examples are still useful.
The shell scripts in the git source tree use all the bells and whistles of the here doc syntax:
q_to_nul <<-\EOF | test-line-buffer >actual && skip 2 EOF
First, the E
in the here terminator is escaped -- this is equivalent to
<<'EOF'
or <<"EOF"
, which makes it so that $vars
aren't expanded in the
body. That is, the body is treated like single-quoted string rather than a
double-quoted string.
Second, they use the <<-
operator, which strips leading tabs, so as not to
mess up the code's indentation.
Third, there is a pipe and command after the here terminator. How do we parse that? It looks weird to me, and Vim's syntax highlighting doesn't understand it.
Some months ago, I was reading through the POSIX spec for here docs (section 2.7.4) and I noticed a similarly odd example:
cat <<eof1; cat <<eof2 Hi, eof1 Helene. eof2 OUTPUT Hi, Helene
It has a single sentence mentioning the possibility of multiple here docs on a single line, but doesn't go into detail.
Well today the answer dawned on me when trying to get git scripts to parse. First, I created this example:
if cat <<EOF1; then echo THEN; cat <<EOF2; fi here doc 1 EOF1 here doc 2 EOF2 echo OUTPUT: here doc 1 THEN here doc 2
This is even weirder — there's a here doc in the if condition, and
another one in the if body. All shells I tested run this correctly (bash
,
dash
, mksh
).
A useful thought experiment: can you take any shell script and write it on a single line? Yes, just replace all newlines with semi-colons. (C has this property too, but Python doesn't.)
Except we can't put the here docs all on one line. In that case, the here doc literals will just be concatenated with the terminators at the end of the script.
Compound commands can have their own here docs:
while read line; do echo "-> $line"; done <<EOF 1 2 EOF OUTPUT: -> 1 -> 2
That is, the here doc is for the entire while
loop, and not for an individual
statement. Putting these two things together, I realized that the rule is:
When parsing, save the here terminators you encounter in the AST (
while_node
,if_node
,simple_command_node
, etc.). After a newline, walk the AST, reading the lines associated with the terminators using a post-order traversal.
That is, the here docs for parent nodes come after their children. Siblings go in the expected order.
Here's an example:
while cat <<EOF1; read line; do echo " -> read '$line'"; cat <<EOF2; done <<EOF3 condition here doc EOF1 body here doc EOF2 while loop here doc 1 while loop here doc 2 EOF3 OUTPUT: condition here doc -> line: 'while loop here doc 1' body here doc condition here doc -> line: 'while loop here doc 2' body here doc condition here doc
So there are three here docs -- one for the condition, one for the body, and one for the while loop. They go in that order: child, child, parent.
All shells I tested respect this subtle behavior, but I've never seen it documented, let alone used in a real shell script. (I didn't actually see it in git.)
In my shell design, I'm thinking about separating here docs into two concepts:
multiline strings, and here strings (<<<
), both of which already exist in
sh
. More on that later.