Why Sponsor Oil? | source | all docs for version 0.15.0 | all versions | oilshell.org
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
Oil's is UTF-8 centric, unlike bash
and other shells.
That is, its Unicode support is like Go, Rust, Julia, and Swift, as opposed to JavaScript, and Python (despite its Python heritage). The former languages use UTF-8, and the latter have the notion of "multibyte characters".
Shell programs should be encoded in UTF-8 (or its ASCII subset). Unicode characters can be encoded directly in the source:
echo 'μ'
or denoted in ASCII with C-escaped strings:
echo $'[\u03bc]'
(Such strings are preferred over echo -e
because they're statically parsed.)
Strings in OSH are arbitrary sequences of bytes, which may be valid UTF-8. Details:
NUL
('\0'
) byte. This is a consequence of how Unix and C work.${#s}
and slicing ${s:1:3}
require the string
to be valid UTF-8. Decoding errors are fatal if shopt -s strict_word_eval
is on.${#s}
-- length in code points
len(s)
counts bytes.${s:1:2}
-- offsets in code points${x#?}
-- a glob for a single characterWhere bash respects it:
?
for a single character,
character classes like [[:alpha:]]
, etc.
echo my?glob
case $x in ?) echo 'one char' ;; esac
[[ $x == ? ]]
${s#?}
(remove one character)${s/?/x}
(note: this uses our glob to ERE translator for position)[[ $x =~ $pat ]]
, which also have character classesprintf '%d' \'c
where c
is an arbitrary character. This is an obscure
syntax for ord()
, i.e. getting an integer from an encoded character.Local-aware operations:
printf
also has time.Other:
wcswidth()
, which doesn't just count
code points. It calculates the display width of characters, which is
different in general.mystr ~ / [ \xff ] /
depends on ERE semantics.iconv
program converts text from one encoding to another.Unlike bash and CPython, Oil doesn't call setlocale()
. (Although GNU
readline may call it.)
It's expected that your locale will respect UTF-8. This is true on most distros. If not, then some string operations will support UTF-8 and some won't.
For example:
${#s}
is implemented in Oil code, not libc, so it will
always respect UTF-8.[[ s =~ $pat ]]
is implemented with libc, so it is affected by the locale
settings. Same with Oil's (x ~ pat)
.TODO: Oil should support LANG=C
for some operations, but not LANG=X
for
other X
.