Why Sponsor Oils? | source | all docs for version 0.20.0 | all versions | oilshell.org
Warning: Work in progress! Leave feedback on Zulip or Github if you'd like this doc to be updated.
This chapter in the Oils Reference describes JSON, and its J8 Notation superset.
See the J8 Notation doc for more background. This doc is a quick reference, not the official spec.
J8 strings are an upgrade of JSON strings that solve the JSON-Unix Mismatch.
That is, Unix deals with byte strings, but JSON can't represent byte strings.
"hi"
All JSON strings are valid J8 strings!
This is important. Encoders often emit JSON-style ""
strings rather than
u''
or b''
strings.
Example:
"hi μ \n"
\" \n \u1234
As a reminder, the backslash escapes valid in JSON strings are:
\" \\
\b \f \n \r \t
\u1234
Additional J8 escapes are valid in u''
and b''
strings, described below.
\ud83e\udd26
JSON's \u1234
escapes can't represent code points above U+10000
or
216, so JSON also has a "surrogate pair hack".
That is, there are special code points in the "surrogate range" that can be paired to represent larger numbers.
See the Surrogate Pair Blog Post for an example:
"\ud83e\udd26"
Because JSON strings are valid J8 strings, surrogate pairs are also part of J8 notation. Decoders must accept them, but encoders should avoid them.
You can emit u'\u{1f926}'
or b'\u{1f926}'
instead of "\ud83\udd26"
.
u'hi'
A type of J8 string.
u'hi μ \n'
It's never necessary to emit, but it can be used to express that a string is valid Unicode. JSON strings can represent strings that aren't Unicode because they may contain surrogate halves.
In contrast, u''
strings can only have escapes like \u{1f926}
, with no
surrogate pairs or halves.
Escaping:
u''
strings may not contain \u1234
escapes. They must be \u{1234}
,
\u{1f926}
\yff
escapes, because those would represent a string
that's not UTF-8 or Unicode.u''
or b''
strings. Use the
longer form \u{1f926}
.\u{1f926}
escapes aren't strictly
necessary. Decoders must accept these escapes.\'
\"
, but encoders don't emit it.b'hi'
Another J8 string. These b''
strings are identical to u''
strings, but
they can also \yff
escapes.
Examples:
b'hi μ \n'
b'this isn\'t a valid unicode string \yff\fe \u{3bc}'
\u{1f926} \yff
To summarize, the valid J8 escapes are:
\'
\yff # only valid in b'' strings
\u{3bc} \u{1f926} etc.
'hi'
Single-quoted strings without a u
or b
prefix are implicitly u''
.
u'hi μ \n'
'hi μ \n' # same as above, no \yff escapes accepted
They should be avoided in contexts where ""
strings may also appear, because
it's easy to confuse single quotes and double quotes.
JSON8 is JSON with 4 rules:
Decoding detail, specific to Oils:
If there's a decimal point or e-10
suffix, then it's decoded into YSH
Float
. Otherwise it's a YSH Int
.
42 # decoded to Int
42.0 # decoded toFloat
42e1 # decoded to Float
42.0e1 # decoded to Float
JSON8 strings are exactly J8 strings:
"hi 🤦 \u03bc" u'hi 🤦 \u{3bc}' b'hi 🤦 \u{3bc} \yff'
Like JSON lists, but can have trailing comma. Examples:
[42, 43]
[42, 43,] # same as above
Like JSON "objects", but:
Examples:
{"json8": "message"}
{json8: "message"} # same as above
{json8: "message",} # same as above
End-of-line comments in the same style as JavaScript and C++:
{"json8": "message"} // comment
These are the J8 Primitives (Bool, Int, Float, Str), separated by tabs.
!tsv8 name age
!type Str Int
!other x y
Alice 42
Bob 25
The primitives:
Note: Can null
be in all cells? Maybe except Bool
?
It can stand in for NA
?