Why Sponsor Oils? | blog | oilshell.org

Oils 0.20.0 - Eggex, JSON, and Android

2024-02-21

This is the latest version of Oils, a Unix shell. It's our upgrade path from bash to a better language and runtime:

Oils version 0.20.0 - Source tarballs and documentation

We're moving toward the fast C++ implementation, so there are two tarballs:

If you're new to the project, see the Oils 2023 FAQ and posts tagged #FAQ.

Table of Contents
Intro
Contributions
Eggex Improvements
JSON / J8 Notation
API for Encoding and Decoding
YSH String Literals
Strings Printed Everywhere
Error Handling Changes
Zulip: Why am I working on JSON?
Dev Build Automated on Ubuntu, Debian, Alpine
Closed Issues
What's Next?
Invite me to speak?
Slogans for Oils
Appendix: Metrics for the 0.20.0 Release
Spec Tests
Benchmarks
Code Size

Intro

This is a big release!

Contributions

Before describing those features in detail, let's review contributions.

Thanks for responding to last month's call for contributions in Oils 0.19.0! We can still use more help. I will also mention improvements to the dev process later in this post.

Adam Bannister:

Matthew Davidson:

Samuel Hierholzer:

Aidan Olsen:

Doc fixes:

Testing the shell. This work is as important, or even more important, than code contributions:


You can also view the full changelog.

Eggex Improvements

Samuel tried doing Advent of Code in YSH, which revealed that the Eggex API wasn't done. So I took a couple weeks to improve it, doing "doc-driven development" with this new doc:

YSH Regex API - Convenient and Powerful

Please try it, and let us know what you think! Our goal is for YSH to be as convenient as Perl, and as powerful as Python.

This possible tweak could make it more convenient, at the cost of being more implicit:

var date_eggex = / <capture d+ as year> '/' <capture d+ as month>/
if (date_eggex ~ '2024/02') {
  echo $[_group('month')]  # => 02
                           # this current way is a bit long
  echo $month              # so it could become a var?

}

Aidan followed up by implementing Str => replace(), which turned out very nicely with YSH reflection. We reuse shell's existing string interpolation, rather than creating a new mini-language, as Python and JavaScript do:

var s = '2024/02'
                                 # looks like string literal
var t = s => replace(date_eggex, ^"$month-$year")
echo $t  # => 02-2024

YSH is simpler because it avoids the needless syntax and parsing. An expression like ^"hi $x" is the unevaluated form of "hi $x" — what we call a value.Expr.


In addition to the API doc above, there are new help topics in the Oils reference, like the one for Str => replace.

Other Eggex changes:

JSON / J8 Notation

I rewrote and replaced the JSON library, which has 2 big benefits:

  1. The new library can be translated to C++. This means almost all our spec tests now pass in C++! (Numbers in the appendix.)
  2. The new library is being upgraded to J8 Notation, which we have a new doc on:

J8 Notation - Fixing the JSON-Unix Mismatch

I want to explain the design and motivation for J8 in many different ways. But right now, the important message is that it's 100% backward compatible with JSON, and looks familiar:

# J8-style string, which can co-exist with JSON strings
u'hi 🙂 \u{1F642}'

API for Encoding and Decoding

You can use JSON and J8 notation with the existing builtin commands:

json read < myfile  # sets _reply var
json write (obj)    # if a string has binary, this is lossy

json8 read < myfile
json8 write (obj)   # able to losslessly encode binary

Or you can use these new functions:

= toJson({x: 42})
= fromJson('{}')

= toJ8([5, 6])
= fromJ8('[5, 6]')

(It now occurs to me that these functions should be called toJson8() and fromJson8(). Sorry, there are still breakages to come.)

YSH String Literals

You no longer need bash's C-escaped strings, which look like $'foo\n', in YSH code. The $ sigil is confusing because it's unrelated to string substitution, and the syntax has other legacy.

Instead, we encourage J8-style strings in source code, which are identical to the format that json8 read accepts:

var x = u'foo\n'       # valid unicode
var y = b'foo\n \yff'  # can also contain binary \yff escapes

So this part of J8 Notation can be used in both code and data! (The Shape of Data is a good post on this topic.)

Strings Printed Everywhere

Misc changes related to string notation:

The pp formats are in contrast to = myobj, which will be an even prettier format, similar to how the browser or NodeJS prints values.

These changes are breaking:

Future work:

Error Handling Changes

JSON serialization involves error handling, so I enhanced YSH error handling.

Zulip: Why am I working on JSON?

Let's take a moment to reflect on how we're working. In September's release of Oils 0.18.0, I posted a job ad, seeking help with JSON serialization.

I ended up working on it mostly myself. I feel bad about that, since one of my goals is to spread knowledge of the codebase. I wrote a thread on Zulip that reflects on why:

To summarize, a big issue is that the design changed while I was implementing it. There's a big puzzle of constraints to solve, often having to do with compatibility and our Language Design Principles.

For example, the strings used to look like j"foo", but that couldn't be "harmonized" with JSON well enough. I switched from double quotes to single quotes, and added the b'' and u'' prefixes. (By the way, these prefixes were inspired by feedback from Zack Weinberg last year.)

Issues like this take tinkering and testing to figure it out. Sometimes it's easier to play with Python code than to write a doc up front.

This interview with Grant Sanderson explains a similar point — sometimes it's easier to play with code than to put a design into words, especially in the early stages.

In other words, we use Python precisely because it's high-level enough to be a spec. And we have a separate C++ translation, which keeps us honest about the spec.

Other reasons I worked on it myself:

To conclude, we now have a great foundation for data notation in Oils, but I still need to work on getting more people involved in the project.

Dev Build Automated on Ubuntu, Debian, Alpine

We made some progress on this front. To work on Oils, you often need to install a bunch of tools like MyPy and its dependencies. This is now automated in our Soil CI:

Oils Dev Setup in Soil CI

I'll elaborate on this in another post. I still want to get of the requirement to install packages as root, and maybe create an online demo with services like GitPod.

I also had some package build problems on Fedora (with a sourcehut image). So if you use Fedora, and are interested in working on Oils, please reach out.

Closed Issues

A subset of what's in this release:

#1795 `command` built-in does not support `-p` option
#1782 source --builtin 'stdlib/math.ysh' failed: No such builtin file
#1776 second operator after and/or should be lazy
#1775 str slice out of range error in native version
#1773 Can't serialize type List_ to JSON
#1767 echo builtin should disallow typed args
#1426 Implement J8 Strings and shopt, for `b''` and `u''`
#1146 Round trip of Oil data structures to text and back
#838 JSON in oil-native

What's Next?

I already started making plans for the next release, Oils 0.21.0. I think we can finish the C++ translation, which has been a slightly embarrassing pain point. The result is good, but I feel like it's taken too long.

I want to batch up more breaking changes to YSH in this release. We have a plan on Github:

I should turn that into a blog post!

Invite me to speak?

I got invited to speak on Oils to Houston Functional Programmers, online this May. I think it could be a good group to attract some contributors.

Most people wouldn't call our code functional, but we do use exhaustive reasoning with sets, via re2c and Zephyr ASDL. And there are functional idioms in both Bourne shell and YSH that I'd like to bring up.

Do you know of similar groups, with members who may have time to work on open source languages and systems? Let me know in the comments.


I've also been talking about #blog-ideas > Oils vs. Crafting Interpreters for several months. An interesting parallel is that Lox is implemented twice in the book: in Java and then in C.

Oils is also implemented twice: in typed Python and in C++!

I don't really know what these talks could look like, but there's a ton of material. The challenge would really be to cut it down to a reasonable amount of time. I could speak for hours about this project!

Slogans for Oils

I continually want to remind readers what Oils is. Here are two recent slogans:

This sounds like it must be big and complex, but the Oils source code is paradoxically small. There's around 56K lines of hand-written code, which expands to 112K lines of mostly-generated C++.

I want to turn these slogans into blog posts with demos, and elaborate on how the "middle-out" style leads to short, spec-driven code. For now, see A Tour of YSH!

Appendix: Metrics for the 0.20.0 Release

These metrics help me keep track of the project. Let's compare this release with the previous one, version 0.19.0.

Spec Tests

We made reasonable progress on OSH, though we have a backlog of failing tests to fix:

The fix to disallow typed args to echo exposed a couple C++ translation errors (already fixed for the next release):

There are 74 new tests passing in YSH, due to the overhaul of both Eggex and JSON:

JSON / J8 Notation is the last major part of the C++ translation, making 79 more tests pass. This is the highlight of this release!

Benchmarks

Not much changed in terms of performance during this release. The parser is the same speed:

And uses the same amount of memory:

The synthetic Fibonacci benchmark is stable:

I/O bound workloads remain the same speed:

Code Size

Oils is still a small program in terms of source code:

And generated C++:

The compiled binary got a bit bigger: