Why Sponsor Oils? | blog | oilshell.org
In November's release notes, I mentioned that YSH now has objects.
But I said that we're not using PowerShell-like objects, which live in a .NET virtual machine. Instead, YSH uses plain data over pipes, like JSON or TSV.
So why does YSH have objects? This post explains seven use cases:
If the "shell" framing doesn't make sense to you, another way to phrase the question is:
Why should a multi-process glue language have objects?
If you're new to the project, see posts tagged #FAQ, and What Oils Looks Like in 2024.
Let's first see what we can do without objects. You can imagine a better shell that has structured data, like Awk or Scheme, but doesn't have objects.
For context, that's essentially what I wrote about in 2020. Unlike POSIX shell and bash, YSH has reliable error handling, safe serialization, arrays, and is statically parsed.
argv
Arrays... are not objects. Here's a simple example:
ysh$ var packages = ['gcc', 'sqlite3'] # create args
ysh$ sudo apt-get install @packages # splice into a command
So you can represent command arguments with simple data structures, not objects.
(By the way, the @splice
operation is hard in shell!)
... are not objects. Here's an example.
First, any shell can create a JSON text file:
bash$ echo '{"counts": [42, 43]}' > my.json
But OSH and YSH can also read it:
ysh$ json read (&x) < my.json # &x is a "Place" to put the data structure
ysh-0.24.0$ = x # print resulting type and contents
(Dict) {counts: [42, 43]}
And calculate on it:
ysh$ var sum = x.counts[0] + x.counts[1]
ysh$ = sum
(Int) 85
So data can live in memory, or it can be serialized to pipes and files. But there are still no objects involved.
That brings us to the question ...
Technically, a value of type Obj
is a linked list of Dicts.
In Oils 0.24.0, you create one with Obj.new()
, which is similar to JavaScript's Object() constructor:
ysh$ var obj = Obj.new({x: 42}, null) # null signifies the end
ysh$ = obj
(Obj) (x: 42) # parens () rather than braces {}
You can put this obj
instance "at the end" of another object, creating another link in the chain:
ysh$ var two = Obj.new({name: 'foo'}, obj)
$ = two
(Obj) (name: 'foo') --> (x: 42) # looks like a linked list
Functions that are on the prototype chain become methods. Their first argument is self
, and you invoke them with the .
and ->
operators:
echo $[myObj.method()] # regular methods use the . operator
call myObj->mutatingMethod() # mutating methods use the -> operator
Quick comparison:
obj.prototype
name. Instead, you use the functions first(obj)
and rest(obj)
to navigate the chain.I think of these objects as a minimal mechanism for polymorphism, which means that different data can present the same interface.
You may have noticed that Obj.new()
is a low level operation, and the API is minimal.
This is intentional, and it's mostly because there's an asymmetry between creating libraries and using them.
In the examples below, you'll see that we don't directly create objects. They're "behind the scenes".
A YSH script is often a simple list of commands, or a declaration of JSON-like data. That's why I started this post with two examples that don't use objects!
Here are seven reasons we have objects, ordered roughly from concrete to abstract.
I like the style of an endswith()
method:
if (filename.endsWith('.py')) {
echo 'Python'
}
more than a free function:
if (endsWith(filename, '.py')) { # INVALID YSH
echo 'Python`
}
This is arguably an abuse of "objects", because there's no encapsulation or state.
But Python and JavaScript both use this idiom, and YSH aims to be familiar to Python and JavaScript users.
(Related: YSH Language Influences.)
In March, I described our flag parsing API, and compared it to Python:
Instead of using strings to specify the type of a flag value:
parser (&spec) {
flag --count ('Int') # 'Int' is a string
flag --dest ('Str')
}
The Oils 0.24.0 release uses type objects:
flag --dest (Str) # Int is a type object (no quotes)
flag --count (Int)
We also have type expressions with []
:
flag --path (List[Str]) # A flag that can appear multiple times
This API is unusual, but I like it because it's more declarative than Python's API. You just say what type of data you want.
(Thanks to Will Clardy for implementing this!)
Back in 2021, the way to put YSH code in multiple files was to use the source
builtin, inherited from POSIX shell. This meant that all functions lived in a single namespace.
Oils 0.24.0 introduces Python-like modules, with separate namespaces. The use
builtin imports a module, and you can get its attributes with the .
operator:
use math.ysh # math is now an object
# A func is an attribute on the module object
echo $[math.abs(-42)] # => 42
Modules are also invokable like procs. If you have this file:
# util.ysh
const __provide__ = :| log | # TODO: shorten this syntax
proc log (msg) {
echo $msg >& 2
}
Then you can invoke the log
proc with util log
:
use util.ysh
util log 'hi there' # invoke with util module namespace, then proc name
So:
.
operatorRelated: Guide to Procs and Funcs
__invoke__
On Zulip, I wrote a terse description of how util log
works:
I added module_ysh.InvokeModule() as the
__invoke__
method of the value.Obj returned byuse
!
This description takes a moment to unpack, and understanding the details isn't crucial right now. The general points are:
Module
type in YSH - there are only objects.util.__invoke__
, or user-defined, like dog.speak()
.__str__
and __eq__
.
__call__
, and a few more special methods. But our object model won't be as elaborate as Python's.Here's another example of where objects are useful.
In shell, environment variables automatically become global variables:
echo $PYTHONPATH # shell / OSH code
In Oils 0.24.0, YSH now has a separate ENV
object:
echo $[ENV.PYTHONPATH] # YSH code
Why is it an object? In this case, it's not for polymorphism. It's because shell already has a stack of environment bindings:
myproc() {
env | grep FOO
FOO=42 env | grep FOO # new binding pushed on the stack
env | grep FOO
}
FOO=z myproc
# => FOO=z
# => FOO=42
# => FOO=z
Remember that a YSH object is a linked list of dicts. And a linked list can be used as a stack.
We're arguably abusing objects again, but using fewer concepts in a language makes it smaller and more compositional. For example, you can pretty-print ENV
like any other object, and you can use the same first()
and rest()
functions on it.
By the way, security is one reason ENV
is separate. We want a syntactic distinction between external inputs and internal variables.
It's related to this design bug in ksh, bash, and zsh:
TIL: Some surprising code execution sources in bash (yossarian.net
via lobste.rs)
120 points, 47 comments on 2024-11-20
And that bug is similar to the 2014 ShellShock bug, which involved env vars in bash. Attackers that controlled the value of env vars could execute arbitrary shell code.
To review, we saw five uses of objects that are already in YSH:
Now let's look at two more ideas, which we're still implementing.
In January, the Oils 0.19.0 release introduced the renderPrompt(io)
hook. It's arguably a nicer way to configure the prompt than $PS1
: a pure function that's passed the io
object, and returns a string.
We've continued designing APIs in this style, like io.stdin
. And we may classify more APIs as impure, e.g. by moving glob()
to io.glob()
.
Why? Because the call glob('*.txt')
doesn't just depend on its arguments — it also depends on what's on the file system, and what the current directory is.
So in the future:
io
may be "ambient" inside shell-like procs, butSo we can use objects like io
support pure functions. Objects can be a mechanism for access control.
Another object that would be restricted is the vm
object, which is for runtime introspection.
The issue of purity also relates to Hay configuration. (I mentioned that Hay is a "straggler" in September, i.e. it needs a round of revisions.)
Here's how you use the syntax of YSH to declare data:
Package cpython {
url = 'https://python.org/'
version = '3.11'
}
But what if you put a command in the middle?
Package sqlite3 {
read --all < /etc/passwd # Probably shouldn't be allowed
}
As a good mix of purity and flexibility, we might disallow the example above, but allow this:
func createPackages(io) { # io must be passed to run any commands
Package sqlite3 {
var cmd = ^(read --all < /etc/passwd) # value of type Command
call io->eval(cmd) # Run it
}
}
call createPackages(io)
This design issue is related to:
Python doesn't have sandboxed interpreters, but I think it's a compelling feature for Oils. Purity and namespaces will be a big chunk of our work in 2025.
I'm interested in comments from anyone who wants to use such a feature, or who has experience with it in other languages.
I described seven parts of YSH that can use support from objects. This was a bit surprising! YSH is not like Awk; it's more like Python or JavaScript.
And it's also a shell, which reminds me that YSH is a rich language.
But even though it's like shell+Python+YAML+more, squished together, the implementation is not big:
I think systems will be simpler if we can use one glue language rather than two; or two glue languages rather than three. The boundaries between components tend to be where bugs arise.
Slogan:
I don't want to glue my glue together
I think this problem is still getting worse:
I showed seven reasons that YSH has objects. A few years ago, I thought YSH might be more like Awk or Scheme, without objects. But based on using it, it's clear that we need objects.
I welcome more feedback! The language feels more solid, but it can still be changed.
Thanks to Aidan Olsen for nudging me in the right direction on the design of objects. In September and October, YSH evolved rapidly, which you'll see in the next post:
On Zulip, Aidan also pushed closures in the right direction! Originally I had wanted just one of objects and closures.
But he posted a convincing example that relates to Hay, and its the staged evaluation model:
for page in index.html page.html {
task "Build $page" {
cp $page /home/www/mysite # $page should be captured here
}
}
I found that this wasn't too hard to implement, and it didn't slow down the interpreter. So in Oils 0.24.0, block arguments are closures.
That is, the block argument to task
has a stack frame attached to it, and that frame can point to an "enclosing" frame.
I see that security came up twice in this post:
ENV
object means that external input (which can be controlled by an attacker) looks different than internal variables.Dict
and Obj
types mean that we don't have issues like prototype pollution.
obj.prototype
name.More on dicts/data vs. objects:
I think of objects as interior, and dicts as exterior. These posts on the design of YSH are still very relevant:
This was a big year for YSH! Last year, it became "real":
Before that, it was basically a prototype! We can now iterate quickly on the design, and mycpp makes the shell fast:
If you want to read more, I reviewed the project "from scratch" in September: