Why Sponsor Oils? | blog | oilshell.org
At the end of Shell Has a Forth-Like Quality, I mentioned that ssh
and su
have odd interfaces. Instead of using part of their own argv
array as the argv
array for the child — which allows Bernstein
Chaining — they accept shell strings.
This interface requires a shell to interpret, so the process tree looks like this:
su andy -c 'ls /' /bin/sh -c 'ls /' /bin/ls /
rather than this:
sudo ls / # sudo uses its argv array /bin/ls /
On Reddit, anacrolix
asked if a wrapper can fix this. Though
I've hit this problem while using scp
to copy filenames with spaces, I hadn't
thought about it until then.
This problem can be solved with Python 2's commands module (note 5):
#!/usr/bin/python
# argv_to_sh.py
import commands
import sys
# strategy: double quote if it has a single quote; otherwise single quote
for arg in sys.argv[1:]:
sys.stdout.write(commands.mkarg(arg))
This script is invoked like this:
ssh localhost "$(./argv_to_sh.py touch 'filename with spaces')"
which is equivalent to:
ssh localhost "touch 'filename with spaces'"
In other words, it eliminates the need for double quoting, which is hard to read and write, especially when dealing with whitespace, quotes, and backslashes.
For comparison, if you assumed the composable argv
interface:
ssh localhost touch 'filename with spaces'
you would get three files in your home directory, instead of one.
$ ls -1 ~/filename ~/with ~/spaces /home/andy/filename /home/andy/with /home/andy/spaces
Even more oddly, ssh
does accept an array, but its elements are joined
before passing them to the shell. For example, this command has two
double-quoted arguments, but it works just like previous commands:
ssh localhost "touch" "'filename with spaces'".
The help for ssh
is misleading, implying that it takes a single command:
usage: ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] ... ... [-w local_tun[:remote_tun]] [user@]hostname [command]
I would notate it like this:
ssh [user@]hostname [SHELL_STRING_PART]...
It's conceivable that ssh
could spawn touch
or ls
without a remote shell.
Likewise for su
.
One possible reason for accepting a shell string is that you can do remote evaluation of environment variables:
ssh localhost 'echo $HOME' # evaluate in remote environment
ssh localhost "echo $HOME" # evaluate in local environment
ssh localhost echo $HOME # ditto, local environment
A more fundamental reason is that they both run processes under a different
user (uid
), and the Unix convention is that the shell sets up a user's
environment, e.g. setting $HOME
and $USER
. Though this isn't
entirely convincing because many Unix daemons run directly under init
as
different uid
, without a shell as a parent.
There's less justification for argument joining, but ssh
probably does this
so you can leave off the quotes in the common case:
ssh localhost 'touch filename_without_spaces' # explicit
ssh localhost touch filename_without_spaces # for convenience
I consider this a misfeature because it causes confusion about what the input syntax is. A different flag for each syntax would be a nicer interface:
ssh -c 'echo $HOME' # shell string interface
ssh -a touch 'filename with spaces' # argv array interface
(1) Python's commands module was deprecated in favor of subprocess, and
the useful mkarg()
function was lost. If you use a system without Python 2,
you can copy the function from commands.py
into argv_to_sh.py
(and perhaps
call it argv-to-sh
).
# Make a shell command argument from a string.
# Return a string beginning with a space followed by a shell-quoted
# version of the argument.
# Two strategies: enclose in single quotes if it contains none;
# otherwise, enclose in double quotes and prefix quotable characters
# with backslash.
#
def mkarg(x):
if '\'' not in x:
return ' \'' + x + '\''
s = ' "'
for c in x:
if c in '\\$"`':
s = s + '\\'
s = s + c
s = s + '"'
return s
(2) Compare:
In Unix, the argv
array interface is more fundamental because system()
is
defined in terms of exec()
. That is, system(const char* command)
is
defined as exec(['sh', '-c', command])
.
(3) In contrast, the Windows CreateProcess()
API takes a string, not an array
of strings. Making each application is responsible for quoting leads to
inconsistency:
(4) In Python, this is the difference between shell=True
and shell=False
:
# Works as intended; lists $HOME directory
subprocess.call('ls ~', shell=True)
# Tries to execute binary named `ls ~`
subprocess.call('ls ~', shell=False)
(4) Julia has an unique API that allows hygienic string interpolation and avoids the shell.
(5) 2018/02/05 update: Tom Most pointed out that Python 3 has
pipes.quote()
, which is similar to commands.mkarg()
.
It's important to precise about code that deals with strings vs. code that
deals with argv
arrays. Using arrays not only saves a shell process, but
removes surface area for command injection
attacks.
You can use the argv_to_sh.py
script with ssh
and su
to eliminate errors
caused by incorrect double quoting.
As always, the blog-code
repository on Github contains code and
demonstrations
from this article.
Working on the osh to oil converter has prompted a lot of thinking and research on ML. I borrowed the model of algebraic data types via ASDL, but now I want pattern matching too!
I may write about this, but unfortunately I think I have to bootstrap Oil in Python first, without pattern matching.
There are still many topics left on the blog TODO stack.
seschwar
on Lobsters makes a good point
about the su interface. You can avoid double quoting with -c 'exec "$0" "$@"'
.