Why Sponsor Oils? | blog | oilshell.org

Why Should a Unix Shell Have Objects?

2024-12-17

In November's release notes, I mentioned that YSH now has objects.

But I said that we're not using PowerShell-like objects, which live in a .NET virtual machine. Instead, YSH uses plain data over pipes, like JSON or TSV.

So why does YSH have objects? This post explains seven use cases:

  1. Methods
  2. Flag Parsing
  3. Modules
  4. Polymorphism
  5. ENV

  1. Pure Functions
  2. Pure Config

If the "shell" framing doesn't make sense to you, another way to phrase the question is:

Why should a multi-process glue language have objects?

If you're new to the project, see posts tagged #FAQ, and What Oils Looks Like in 2024.

Table of Contents
Two Things That Aren't Objects
argv Arrays
JSON-like Data Over Pipes
What Is a YSH Object?
Objects Are For Language and Library Authors
Reasons to Have Them
Objects are Namespaces for Methods
Flag Parsing: Types are Objects
Modules are Objects
Meta-Object Protocol with __invoke__
ENV is an Object (linked list as stack)
Still In Progress
Pure Functions / IO Capability?
Pure Config - like Tcl
Is YSH a Big Language?
The Ultimate Glue Language
Conclusion
Appendix
Why Should a Unix Shell have Closures?
Two Security Issues
Review of YSH in 2024

Two Things That Aren't Objects

Let's first see what we can do without objects. You can imagine a better shell that has structured data, like Awk or Scheme, but doesn't have objects.

For context, that's essentially what I wrote about in 2020. Unlike POSIX shell and bash, YSH has reliable error handling, safe serialization, arrays, and is statically parsed.

argv Arrays

... are not objects. Here's a simple example:

ysh$ var packages = ['gcc', 'sqlite3']  # create args

ysh$ sudo apt-get install @packages     # splice into a command

So you can represent command arguments with simple data structures, not objects.

(By the way, the @splice operation is hard in shell!)

JSON-like Data Over Pipes

... are not objects. Here's an example.

First, any shell can create a JSON text file:

bash$ echo '{"counts": [42, 43]}' > my.json

But OSH and YSH can also read it:

ysh$ json read (&x) < my.json  # &x is a "Place" to put the data structure

ysh-0.24.0$ = x                # print resulting type and contents
(Dict)  {counts: [42, 43]}

And calculate on it:

ysh$ var sum = x.counts[0] + x.counts[1]

ysh$ = sum
(Int)  85

So data can live in memory, or it can be serialized to pipes and files. But there are still no objects involved.

That brings us to the question ...

What Is a YSH Object?

Technically, a value of type Obj is a linked list of Dicts.

In Oils 0.24.0, you create one with Obj.new(), which is similar to JavaScript's Object() constructor:

ysh$ var obj = Obj.new({x: 42}, null)  # null signifies the end
ysh$ = obj
(Obj)   (x: 42)                     # parens () rather than braces {}

You can put this obj instance "at the end" of another object, creating another link in the chain:

ysh$ var two = Obj.new({name: 'foo'}, obj)
$ = two
(Obj)   (name: 'foo') --> (x: 42)   # looks like a linked list

Functions that are on the prototype chain become methods. Their first argument is self, and you invoke them with the . and -> operators:

echo $[myObj.method()]        # regular methods use the . operator

call myObj->mutatingMethod()  # mutating methods use the -> operator

Quick comparison:

I think of these objects as a minimal mechanism for polymorphism, which means that different data can present the same interface.

Objects Are For Language and Library Authors

You may have noticed that Obj.new() is a low level operation, and the API is minimal.

This is intentional, and it's mostly because there's an asymmetry between creating libraries and using them.

In the examples below, you'll see that we don't directly create objects. They're "behind the scenes".

A YSH script is often a simple list of commands, or a declaration of JSON-like data. That's why I started this post with two examples that don't use objects!

Reasons to Have Them

Here are seven reasons we have objects, ordered roughly from concrete to abstract.

Objects are Namespaces for Methods

I like the style of an endswith() method:

if (filename.endsWith('.py')) {
  echo 'Python'
}

more than a free function:

if (endsWith(filename, '.py')) {   # INVALID YSH
  echo 'Python`
}

This is arguably an abuse of "objects", because there's no encapsulation or state.

But Python and JavaScript both use this idiom, and YSH aims to be familiar to Python and JavaScript users.

(Related: YSH Language Influences.)

Flag Parsing: Types are Objects

In March, I described our flag parsing API, and compared it to Python:

Instead of using strings to specify the type of a flag value:

parser (&spec) {
  flag --count ('Int')    # 'Int' is a string
  flag --dest ('Str')
}

The Oils 0.24.0 release uses type objects:

flag --dest (Str)         # Int is a type object (no quotes)
flag --count (Int)

We also have type expressions with []:

flag --path (List[Str])   # A flag that can appear multiple times

This API is unusual, but I like it because it's more declarative than Python's API. You just say what type of data you want.

(Thanks to Will Clardy for implementing this!)

Modules are Objects

Back in 2021, the way to put YSH code in multiple files was to use the source builtin, inherited from POSIX shell. This meant that all functions lived in a single namespace.

Oils 0.24.0 introduces Python-like modules, with separate namespaces. The use builtin imports a module, and you can get its attributes with the . operator:

use math.ysh  # math is now an object

# A func is an attribute on the module object
echo $[math.abs(-42)]  # => 42

Modules are also invokable like procs. If you have this file:

# util.ysh
const __provide__ = :| log |  # TODO: shorten this syntax

proc log (msg) {
  echo $msg >& 2
}

Then you can invoke the log proc with util log:

use util.ysh

util log 'hi there'   # invoke with util module namespace, then proc name

So:

Related: Guide to Procs and Funcs

Meta-Object Protocol with __invoke__

On Zulip, I wrote a terse description of how util log works:

I added module_ysh.InvokeModule() as the __invoke__ method of the value.Obj returned by use!

This description takes a moment to unpack, and understanding the details isn't crucial right now. The general points are:

ENV is an Object (linked list as stack)

Here's another example of where objects are useful.

In shell, environment variables automatically become global variables:

echo $PYTHONPATH        # shell / OSH code

In Oils 0.24.0, YSH now has a separate ENV object:

echo $[ENV.PYTHONPATH]  # YSH code

Why is it an object? In this case, it's not for polymorphism. It's because shell already has a stack of environment bindings:

myproc() {
  env | grep FOO

  FOO=42 env | grep FOO  # new binding pushed on the stack

  env | grep FOO
}

FOO=z myproc
# => FOO=z
# => FOO=42
# => FOO=z

Remember that a YSH object is a linked list of dicts. And a linked list can be used as a stack.

We're arguably abusing objects again, but using fewer concepts in a language makes it smaller and more compositional. For example, you can pretty-print ENV like any other object, and you can use the same first() and rest() functions on it.


By the way, security is one reason ENV is separate. We want a syntactic distinction between external inputs and internal variables.

It's related to this design bug in ksh, bash, and zsh:

And that bug is similar to the 2014 ShellShock bug, which involved env vars in bash. Attackers that controlled the value of env vars could execute arbitrary shell code.

Still In Progress

To review, we saw five uses of objects that are already in YSH:

  1. Methods
  2. Flag Parsing / Type Objects
  3. Modules
  4. Polymorphism and the Meta-Object Protocol
  5. ENV

Now let's look at two more ideas, which we're still implementing.

Pure Functions / IO Capability?

In January, the Oils 0.19.0 release introduced the renderPrompt(io) hook. It's arguably a nicer way to configure the prompt than $PS1: a pure function that's passed the io object, and returns a string.

We've continued designing APIs in this style, like io.stdin. And we may classify more APIs as impure, e.g. by moving glob() to io.glob().

Why? Because the call glob('*.txt') doesn't just depend on its arguments — it also depends on what's on the file system, and what the current directory is.


So in the future:

So we can use objects like io support pure functions. Objects can be a mechanism for access control.

Another object that would be restricted is the vm object, which is for runtime introspection.

Pure Config - like Tcl

The issue of purity also relates to Hay configuration. (I mentioned that Hay is a "straggler" in September, i.e. it needs a round of revisions.)

Here's how you use the syntax of YSH to declare data:

Package cpython {
  url = 'https://python.org/'
  version = '3.11'
}

But what if you put a command in the middle?

Package sqlite3 {
  read --all < /etc/passwd  # Probably shouldn't be allowed
}

As a good mix of purity and flexibility, we might disallow the example above, but allow this:

func createPackages(io) {   # io must be passed to run any commands
  Package sqlite3 {
    var cmd = ^(read --all < /etc/passwd)  # value of type Command
    call io->eval(cmd)                     # Run it
  }
}

call createPackages(io)

This design issue is related to:

Python doesn't have sandboxed interpreters, but I think it's a compelling feature for Oils. Purity and namespaces will be a big chunk of our work in 2025.

I'm interested in comments from anyone who wants to use such a feature, or who has experience with it in other languages.

Is YSH a Big Language?

I described seven parts of YSH that can use support from objects. This was a bit surprising! YSH is not like Awk; it's more like Python or JavaScript.

And it's also a shell, which reminds me that YSH is a rich language.

But even though it's like shell+Python+YAML+more, squished together, the implementation is not big:

The Ultimate Glue Language

I think systems will be simpler if we can use one glue language rather than two; or two glue languages rather than three. The boundaries between components tend to be where bugs arise.

Slogan:

I don't want to glue my glue together

I think this problem is still getting worse:

Conclusion

I showed seven reasons that YSH has objects. A few years ago, I thought YSH might be more like Awk or Scheme, without objects. But based on using it, it's clear that we need objects.

I welcome more feedback! The language feels more solid, but it can still be changed.

Thanks to Aidan Olsen for nudging me in the right direction on the design of objects. In September and October, YSH evolved rapidly, which you'll see in the next post:

 

Appendix

Why Should a Unix Shell have Closures?

On Zulip, Aidan also pushed closures in the right direction! Originally I had wanted just one of objects and closures.

But he posted a convincing example that relates to Hay, and its the staged evaluation model:

for page in index.html page.html {
   task "Build $page" {
     cp $page /home/www/mysite   # $page should be captured here
   }
}

I found that this wasn't too hard to implement, and it didn't slow down the interpreter. So in Oils 0.24.0, block arguments are closures.

That is, the block argument to task has a stack frame attached to it, and that frame can point to an "enclosing" frame.

Two Security Issues

I see that security came up twice in this post:


More on dicts/data vs. objects:

I think of objects as interior, and dicts as exterior. These posts on the design of YSH are still very relevant:

Review of YSH in 2024

This was a big year for YSH! Last year, it became "real":

Before that, it was basically a prototype! We can now iterate quickly on the design, and mycpp makes the shell fast:


If you want to read more, I reviewed the project "from scratch" in September: