Why Should a Unix Shell Have Objects?

2024-12-17

In November's release notes, I mentioned that YSH now has objects.

But I said that we're not using PowerShell-like objects, which live in a .NET virtual machine. Instead, YSH uses plain data over pipes, like JSON or TSV.

So why does YSH have objects? This post explains seven use cases:

Methods
Flag Parsing
Modules
Polymorphism
ENV

Pure Functions
Pure Config

If the "shell" framing doesn't make sense to you, another way to phrase the question is:

Why should a multi-process glue language have objects?

If you're new to the project, see posts tagged #FAQ, and What Oils Looks Like in 2024.

Two Things That Aren't Objects

Let's first see what we can do without objects. You can imagine a better shell that has structured data, like Awk or Scheme, but doesn't have objects.

For context, that's essentially what I wrote about in 2020. Unlike POSIX shell and bash, YSH has reliable error handling, safe serialization, arrays, and is statically parsed.

`argv` Arrays

... are not objects. Here's a simple example:

ysh$ var packages = ['gcc', 'sqlite3']  # create args

ysh$ sudo apt-get install @packages     # splice into a command

So you can represent command arguments with simple data structures, not objects.

(By the way, the @splice operation is hard in shell!)

JSON-like Data Over Pipes

... are not objects. Here's an example.

First, any shell can create a JSON text file:

bash$ echo '{"counts": [42, 43]}' > my.json

But OSH and YSH can also read it:

ysh$ json read (&x) < my.json  # &x is a "Place" to put the data structure

ysh$ = x                # print resulting type and contents
(Dict)  {counts: [42, 43]}

And calculate on it:

ysh$ var sum = x.counts[0] + x.counts[1]

ysh$ = sum
(Int)  85

So data can live in memory, or it can be serialized to pipes and files. But there are still no objects involved.

That brings us to the question ...

What Is a YSH Object?

Technically, a value of type Obj is a linked list of Dicts.

In Oils 0.24.0, you create one with Obj.new(), which is similar to JavaScript's Object() constructor:

ysh$ var obj = Obj.new({x: 42}, null)  # null signifies the end
ysh$ = obj
(Obj)   (x: 42)                     # parens () rather than braces {}

You can put this obj instance "at the end" of another object, creating another link in the chain:

ysh$ var two = Obj.new({name: 'foo'}, obj)
$ = two
(Obj)   (name: 'foo') --> (x: 42)   # looks like a linked list

Functions that are on the prototype chain become methods. Their first argument is self, and you invoke them with the . and -> operators:

echo $[myObj.method()]        # regular methods use the . operator

call myObj->mutatingMethod()  # mutating methods use the -> operator

Quick comparison:

Unlike JavaScript, there's no special obj.prototype name. Instead, you use the functions first(obj) and rest(obj) to navigate the chain.
Unlike Python, there's no notion of class, and no inheritance.
- You're not supposed to write big object-oriented shell scripts!

I think of these objects as a minimal mechanism for polymorphism, which means that different data can present the same interface.

Objects Are For Language and Library Authors

You may have noticed that Obj.new() is a low level operation, and the API is minimal.

This is intentional, and it's mostly because there's an asymmetry between creating libraries and using them.

In the examples below, you'll see that we don't directly create objects. They're "behind the scenes".

A YSH script is often a simple list of commands, or a declaration of JSON-like data. That's why I started this post with two examples that don't use objects!

Reasons to Have Them

Here are seven reasons we have objects, ordered roughly from concrete to abstract.

Objects are Namespaces for Methods

I like the style of an endswith() method:

if (filename.endsWith('.py')) {
  echo 'Python'
}

more than a free function:

if (endsWith(filename, '.py')) {   # INVALID YSH
  echo 'Python`
}

This is arguably an abuse of "objects", because there's no encapsulation or state.

But Python and JavaScript both use this idiom, and YSH aims to be familiar to Python and JavaScript users.

(Related: YSH Language Influences.)

Flag Parsing: Types are Objects

In March, I described our flag parsing API, and compared it to Python:

Oils 0.21.0 - Flags, Integers, Starship Bug, and Speed

Instead of using strings to specify the type of a flag value:

parser (&spec) {
  flag --count ('Int')    # 'Int' is a string
  flag --dest ('Str')
}

The Oils 0.24.0 release uses type objects:

flag --dest (Str)         # Int is a type object (no quotes)
flag --count (Int)

We also have type expressions with []:

flag --path (List[Str])   # A flag that can appear multiple times

This API is unusual, but I like it because it's more declarative than Python's API. You just say what type of data you want.

(Thanks to Will Clardy for implementing this!)

Modules are Objects

Back in 2021, the way to put YSH code in multiple files was to use the source builtin, inherited from POSIX shell. This meant that all functions lived in a single namespace.

Oils 0.24.0 introduces Python-like modules, with separate namespaces. The use builtin imports a module, and you can get its attributes with the . operator:

use math.ysh  # math is now an object

# A func is an attribute on the module object
echo $[math.abs(-42)]  # => 42

Modules are also invokable like procs. If you have this file:

# util.ysh
const __provide__ = :| log |  # TODO: shorten this syntax

proc log (msg) {
  echo $msg >& 2
}

Then you can invoke the log proc with util log:

use util.ysh

util log 'hi there'   # invoke with util module namespace, then proc name

So:

funcs are called by expressions, and you access them with the . operator
procs are invoked by commands, and you access them with words separated by a space

Related: Guide to Procs and Funcs

Meta-Object Protocol with `invoke`

On Zulip, I wrote a terse description of how util log works:

I added module_ysh.InvokeModule() as the __invoke__ method of the value.Obj returned by use!

#language-design > Modules are now invokable - mymodule myproc

This description takes a moment to unpack, and understanding the details isn't crucial right now. The general points are:

There is no Module type in YSH - there are only objects.
Objects are polymorphic: different kinds of data can present the same interface.
- The polymorphic methods can be "special", like util.__invoke__, or user-defined, like dog.speak().
YSH objects give us a meta-object protocol, like Python's __str__ and __eq__.
- We'll add __call__, and a few more special methods. But our object model won't be as elaborate as Python's.

ENV is an Object (linked list as stack)

Here's another example of where objects are useful.

In shell, environment variables automatically become global variables:

echo $PYTHONPATH        # shell / OSH code

In Oils 0.24.0, YSH now has a separate ENV object:

echo $[ENV.PYTHONPATH]  # YSH code

Why is it an object? In this case, it's not for polymorphism. It's because shell already has a stack of environment bindings:

myproc() {
  env | grep FOO

  FOO=42 env | grep FOO  # new binding pushed on the stack

  env | grep FOO
}

FOO=z myproc
# => FOO=z
# => FOO=42
# => FOO=z

Remember that a YSH object is a linked list of dicts. And a linked list can be used as a stack.

We're arguably abusing objects again, but using fewer concepts in a language makes it smaller and more compositional. For example, you can pretty-print ENV like any other object, and you can use the same first() and rest() functions on it.

By the way, security is one reason ENV is separate. We want a syntactic distinction between external inputs and internal variables.

It's related to this design bug in ksh, bash, and zsh:

And that bug is similar to the 2014 ShellShock bug, which involved env vars in bash. Attackers that controlled the value of env vars could execute arbitrary shell code.

Still In Progress

To review, we saw five uses of objects that are already in YSH:

Methods
Flag Parsing / Type Objects
Modules
Polymorphism and the Meta-Object Protocol
ENV

Now let's look at two more ideas, which we're still implementing.

Pure Functions / IO Capability?

In January, the Oils 0.19.0 release introduced the renderPrompt(io) hook. It's arguably a nicer way to configure the prompt than $PS1: a pure function that's passed the io object, and returns a string.

We've continued designing APIs in this style, like io.stdin. And we may classify more APIs as impure, e.g. by moving glob() to io.glob().

Why? Because the call glob('*.txt') doesn't just depend on its arguments — it also depends on what's on the file system, and what the current directory is.

So in the future:

io may be "ambient" inside shell-like procs, but
it must be passed to Python-like funcs.

So we can use objects like io support pure functions. Objects can be a mechanism for access control.

Another object that would be restricted is the vm object, which is for runtime introspection.

Pure Config - like Tcl

The issue of purity also relates to Hay configuration. (I mentioned that Hay is a "straggler" in September, i.e. it needs a round of revisions.)

Here's how you use the syntax of YSH to declare data:

Package cpython {
  url = 'https://python.org/'
  version = '3.11'
}

But what if you put a command in the middle?

Package sqlite3 {
  read --all < /etc/passwd  # Probably shouldn't be allowed
}

As a good mix of purity and flexibility, we might disallow the example above, but allow this:

func createPackages(io) {   # io must be passed to run any commands
  Package sqlite3 {
    var cmd = ^(read --all < /etc/passwd)  # value of type Command
    call io->eval(cmd)                     # Run it
  }
}

call createPackages(io)

This design issue is related to:

Safe Interpreters in Tcl - i.e. sandboxed interpreters
Evaluating Config Files with Lua
Deno vs. Node JS.
- Unlike Node, Deno has APIs for sandboxed v8 "isolates".

Python doesn't have sandboxed interpreters, but I think it's a compelling feature for Oils. Purity and namespaces will be a big chunk of our work in 2025.

I'm interested in comments from anyone who wants to use such a feature, or who has experience with it in other languages.

Is YSH a Big Language?

I described seven parts of YSH that can use support from objects. This was a bit surprising! YSH is not like Awk; it's more like Python or JavaScript.

And it's also a shell, which reminds me that YSH is a rich language.

But even though it's like shell+Python+YAML+more, squished together, the implementation is not big:

After 8 Years, Oil is Still Small and Flexible. We have about 64 K lines of code, and a single optional dependency (GNU readline). The binary is now 2.3 MB.

The Ultimate Glue Language

I think systems will be simpler if we can use one glue language rather than two; or two glue languages rather than three. The boundaries between components tend to be where bugs arise.

Slogan:

I don't want to glue my glue together

I think this problem is still getting worse:

We've built up Unix sludge like shell+make+awk+autoconf, and the xz attack in April reminded us of this.
We still have that sludge, and now we also have cloud sludge like shell+YAML+Go templates!

Conclusion

I showed seven reasons that YSH has objects. A few years ago, I thought YSH might be more like Awk or Scheme, without objects. But based on using it, it's clear that we need objects.

I welcome more feedback! The language feels more solid, but it can still be changed.

#oil-discuss-public > Real YSH code in the wild!

Thanks to Aidan Olsen for nudging me in the right direction on the design of objects. In September and October, YSH evolved rapidly, which you'll see in the next post:

Oils 0.24.0 - Closures, Objects, Modules, ENV

Appendix

Why Should a Unix Shell have Closures?

On Zulip, Aidan also pushed closures in the right direction! Originally I had wanted just one of objects and closures.

But he posted a convincing example that relates to Hay, and its the staged evaluation model:

for page in index.html page.html {
   task "Build $page" {
     cp $page /home/www/mysite   # $page should be captured here
   }
}

I found that this wasn't too hard to implement, and it didn't slow down the interpreter. So in Oils 0.24.0, block arguments are closures.

That is, the block argument to task has a stack frame attached to it, and that frame can point to an "enclosing" frame.

Two Security Issues

I see that security came up twice in this post:

The separate ENV object means that external input (which can be controlled by an attacker) looks different than internal variables.
The separate Dict and Obj types mean that we don't have issues like prototype pollution.
- We also don't have a special obj.prototype name.

Review of YSH in 2024

This was a big year for YSH! Last year, it became "real":

2023-08 - Oils 0.17.0 - YSH Is Becoming Real
- Thanks to Melvin Walls for translating our pgen2-based YSH parser to C++, and for helping to remove the "metacircular hack":

Before that, it was basically a prototype! We can now iterate quickly on the design, and mycpp makes the shell fast:

2024-01 - Oils 0.19.0 - Dicts, Procs, Funcs, and Places
2024-02 - Oils 0.20.0 - Eggex, JSON, and Android
2024-03 - Oils 0.21.0 - Flags, Integers, Starship Bug, and Speed
2024-06 - Oils 0.22.0 - Docs, Pretty Printing, Nix, and Zsh
2024-11 - Oils 0.23.0 - Writing YSH Code, User Feedback, and Bug Bounty
- Released in September, but the blog is behind the code!
Next - Oils 0.24.0 - Closures, Objects, Modules, ENV
- Released in November

If you want to read more, I reviewed the project "from scratch" in September: