source | all docs for version 0.8.pre8 | all versions | oilshell.org
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
Oil's unicode support is unlike that of other shells because it's UTF-8-centric.
In other words, it's like newer languages like Go, Rust, Julia, and Swift, as opposed , JavaScript, and Python (despite its Python heritage). The latter languages use the notion of "multibyte characters".
In particular, Oil doesn't have global variables like LANG for libc or a notion of "default encoding". In my experience, these types of globals cause correctness problems.
${#s}
-- length in code points${s:1:2}
-- offsets in code points${x#?}
and family (not yet implemented)Where bash respects it:
This is a list of operations that SHOULD be aware of Unicode characters. OSH doesn't implement all of them yet, e.g. the globbing stuff.
${#s}
${s:0:1}
?
for a single character,
character classes like [[:alpha:]]
, etc.
case $x in ?) echo 'one char' ;; esac
[[ $x == ? ]]
${s#?}
(remove one character)${s/?/x}
(note: this uses our glob to ERE translator for position)printf '%d' \'c
where c
is an arbitrary character. This is an obscure
syntax for ord()
, i.e. getting an integer from an encoded character.List of operations that depend on the locale (not implemented):
[[ $a < $b ]]
-- should use current locale? TODO: compare
with sort
command.${s^}
and ${s,}
printf
also has time.Other:
wcswidth()
, which doesn't just count
code points. It calculates the display width of characters, which is
different in general.