Why Sponsor Oils? | blog | oilshell.org
Consider this program:
$ for i in $(seq 10); do > cat <<EOF > here doc $i > EOF > done here doc 1 here doc 2 here doc 3 here doc 4 here doc 5 here doc 6 here doc 7 here doc 8 here doc 9 here doc 10
To run it, bash
does this 10 times:
fork()
a child processopen()
a temp file for writewrite()
the expanded here doc to it. The contents depends on the
iteration.close()
itopen()
it again read-onlyunlink()
, so it will be deleted after it's closeddup2(4, 0)
the resulting descriptor so that the new process has the temp
file as stdin/bin/cat
processcat
reads the file from disk, writing its contents to stdout
strace listing:
strace -ff -e open,close,unlink,read,write,execve,dup2 \ -- $sh ./here_doc_disk.sh Process 4090 attached [pid 4090] open("/tmp/sh-thd-865008962", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC, 0600) = 3 [pid 4090] write(4, " here doc 1\n", 15) = 15 [pid 4090] close(4) = 0 [pid 4090] open("/tmp/sh-thd-865008962", O_RDONLY) = 4 [pid 4090] close(3) = 0 [pid 4090] unlink("/tmp/sh-thd-865008962") = 0 [pid 4090] dup2(4, 0) = 0 [pid 4090] close(4) = 0 [pid 4090] execve("/bin/cat", ["cat"], [/* 68 vars */]) = 0 ... [pid 4090] read(0, " here doc 1\n", 65536) = 15 [pid 4090] write(1, " here doc 1\n", 15 here doc 1 ) = 15
zsh
and mksh
do the same thing, which surprised me.
dash
does something more expected and elegant, which is to start cat
with
one end of a pipe()
as stdin
, rather than a temp file. Strings longer than
PIPE_SIZE
will cause write()
to block, but I think that just requires a
little extra care in the implementation.
Curiously, the "here string" construct in bash also uses temp files:
cat <<< "here string $i"
I don't see a reason to use temp files in either case, other than the fact that in ancient computing history people didn't want to hold entire "files" in memory. Compilers used to work a line at a time too.
Based on parsing real shell scripts, here docs are generally tiny, so I don't expect string size to be an issue.
I think my shell language will only have the here string operator, and
implement it with pipes like dash
does. From a programmer's perspective,
here docs are just a weird kind of multiline string. These two cat
invocations have the same output:
s="\
one
two"
cat <<< "$s"
cat <<EOF
one
two
EOF
That is, shell strings are already multiline. I guess I should allow some
kind of line-based delimiter in the string literal syntax, because the \
is a
bit ugly. But this special syntax for multiline strings doesn't need to be
coupled with the notion of piping to stdin.
I showed in the last post that here doc syntax is unintuitive in
other ways: quoted delimiters to eliminate expansion; the <<-
variant to
strip leading tabs; and the post-order traversal rule for multiple here docs on
a line.
oil implements all of this in its sh
parser. But now that I fully understand
the traditional syntax, I want to design something nicer for the oil language,
as well as improve its implementation.