Why Sponsor Oils? | blog | oilshell.org
This is another short post based on a Hacker News comment.
Many shell users are confused by the find command, so I show a way to remember its usage (#usage-tips). Like many things in shell, it's awkward, but powerful once you know it.
Why do long options start with two dashes? (2019) (djmnet.org
via Hacker News)
349 points, 241 comments - 1 day ago
find
is weird anyway. The stuff after the arguments aren't really flags, they're a tiny filter language, with significant ordering and operator precedence and all that stuff
Yes exactly, find is like test
a.k.a [
.
test -f foo -a -f bar -o -z spam
[ -f foo -a -f bar -o -z spam ] # same thing!
can be read
isfile('foo') && isfile('bar') || emptystring('spam')
Likewise
find . -name '*.py' -a -executable -a -printf '%s %P\n'
can be read
Traverse '.' and evaluate at every node F:
nameMatches(F, '*.py') && isExecutable(F) && printf('%s %P\n', F)
Both tools respect -a
and -o
for AND / OR, !
for NOT, and ( )
for
precedence.
Confusingly, you must quote (
and )
. This is because they're shell
operators, and the language is embedded in the argv
array. Example:
find . -executable -a '(' -name '*.py' -o -name '*.sh' ')' -a -print
is like
isExecutable(F) &&
(nameMatches(F, '*.py') || nameMatches(F, '*.sh')) &&
print()
The way I remember this abuse is to think that the language designer was too
lazy to write a lexer! In contrast, DSLs like awk and jq
do
not use this pattern. They have
lexers,
and hence lexical syntax. (Related: posts tagged #lexing).
Trivia: the expr
tool for arithmetic also uses this pattern:
# * must be quoted to avoid confusing with glob
$ expr 1 + 2 '*' 3
7
But you shouldn't use it, as POSIX shell arithmetic is now universal:
$ echo $((1 + 2*3))
7
The difference between expr
and $(( ))
is exactly the difference between
[
and [[
: external/builtin vs.
language, dynamic vs. static
parsing.
Oil aims to do away with all this silliness with Python-like, statically parsed expressions.
find
Usage Tips-a
The parser tries hard to add -a
automatically. This abbreviation:
find . -name '*.py' -executable -printf '%s %P\n' # missing -a!
is the same as our longer version above:
find . -name '*.py' -a -executable -a -printf '%s %P\n'
But I prefer the latter style. I think this confusing abbreviation is another reason that people have a hard time learning the syntax and execution model of find.
-prune
with .git
Another interesting aspect of the find
language is that it has side
effects like -print
, -printf
, -exec
, and -prune
.
The -prune
command alters the file system traversal in the middle of it,
which can make it more efficient. For example, this command avoids even
statting nodes under the .git
subtree (not just printing them):
find . -name .git -a -prune -o -print
Or with the fictional syntax:
find . (nameMatches(F, '.git') && prune() || print())
I started using this to optimize Oil's own scripts, like the ones that parse one million lines of shell. Now I feel comfortable using this style interactively. It takes some getting used to.
A couple years ago, someone helped me implement a better find
without this
wonky syntax for Oil. But it isn't done and needs some love. If anyone wants
to help, feel free to join Zulip :-)
I do think that find
is more like a language than a command line tool.
It's pretty powerful; e.g. I just used it to sort through 20 years of haphazard
personal backups.
lr
tool pretty much
has the syntax I explained here!
lr
is a new tool for generating
file listings, which includes the best features of ls(1)
, find(1)
, stat(1)
and du(1)