Why Sponsor Oils? | blog | oilshell.org
I just optimized Oil's runtime by reducing the number of processes that it starts. Surprisingly, you can implement shell features like pipelines and subshells with more than one "process topology".
I described these optimizations on Zulip, and I want to write a post called Oil Starts Fewer Processes Than Other Shells.
That post feels dense, so let's first review some background knowledge, with the help of several great drawings from Julia Evans.
User space and kernel space are key concepts for understanding shell. Why?
ls /bin
and $(dirname $x)
require support from the kernel to start
processes.Shell has dedicated syntax and builtin commands to manipulate its own process state, and thus the inherited state of child processes. You can:
export
.cd
, pushd
, and popd
. Retrieve it
with pwd
or $PWD
.$$
variable, and the parent PID from $PPID
.trap
builtin to register code to run when the process receives a
signal (e.g. SIGINT
or Ctrl-C).wait
builtin.time
to ask the kernel how long a process has taken
to run. (See the last pane of the first comic.)In other words, shell is a thin layer over the process abstraction provided by the kernel. Processes used to be thought of as virtual machines, although that term now has a different connotation.
It's not obvious from the syntax, but there are two different kinds of processes in a shell program:
Those that run a different executable, i.e. assembly code that's not in
/bin/sh
or /usr/local/bin/oil
. Examples:
$ ls # 1 new process and 1 executable (usually)
$ ls | wc -l # 2 new processes and 2 executables (usually)
After calling fork()
to create a process, the shell also calls exec()
to
run code in /bin/ls
or /usr/bin/wc
. The exec()
system call loads and
starts a new "binary image" in the current process.
Those that run the same executable. For example, the left-hand-side of this pipeline
# at least 1 new process, but no new executables
$ { echo a; echo b; } | read x
denotes an independent copy of the shell interpreter, created with the
fork()
system call. No exec()
call is needed.
Is it inefficient to start a process for those two statements? Not really, no. See this related comic: Copy On Write.
The (usually)
qualifiers above are what the next post is about. I
optimized the usage of fork()
and exec()
syscalls in Oil. I was surprised
to learn that all shells do this to some extent.
I really like the red and blue dots in this drawing. It's an intuitive way of
explaining the pipe()
system call, which forces us to understand file
descriptors.
I think of file descriptors as "pointers" from user space into the kernel. They can't literally be pointers, because user space and kernel each have their own memory. So instead they're small integers that are offsets into a table in the kernel.
Related Comics:
The next post will discuss shopt -s lastpipe
, a bash option that
implements zsh-like pipeline behavior.
I still want to get more done in 2020 by cutting scope, but I'd like to illustrate these related concepts:
There are three processes involved in ls | wc -l
.
pipe()
. Importantly, the file descriptors
returned by pipe()
are inherited by children.ls
, which uses the red end of the pipe it inherits.wc -l
, which uses the blue end.The difference between fork()
and exec()
. I explained this above, but a
drawing would make it clearer. If you know of one, let me know in the
comments.
fork()
and exec()
, like
manipulate file descriptors.I'll show that Oil starts fewer processes than other shells for snippets like:
date; date
(date)
date | wc -l
echo $(date)
And I'll describe how I measured this with strace.
These comics are also related to Oil, and I may reference them in future posts:
For what it's worth, I just bought a printed set of zines, Your Linux Toolbox, which have related but different content. You can also buy e-books on wizardzines.com.