Why Sponsor Oils? | blog | oilshell.org
This post shows C++ snippets from our garbage-collected data structures: Str
,
List<T>
, and Dict<K, V>
. I hope it will catch the interest of programmers
with experience in compilers and language runtimes.
Recent posts in the series:
(If you're new to the project, see Why Create a New Unix Shell?.)
I've never worked on a garbage collector before, but I think this code is fun. It's simple, small, and does something useful. One or two people should be able to handle it — it's not a huge industrial project.
I'm sure there are many ways to do better, and I look forward to working on it with experienced engineers! Most people who work on language runtimes don't read this blog, so please share this post. This is a critical part of the project.
If you want to jump straight to the code, see mycpp/gc_heap.h. I started this post by updating the comments in that file.
Like all Unix shells, Oil is an AST interpreter. I once described it as "the first half of Crafting Interpreters, plus 20 or so syscalls"! It's a thin layer over the kernel.
Unlike all Unix shells, it's written in statically-typed Python. We translate this Python to C++ code that uses garbage-collected data structures, which currently use the Cheney semi-space algorithm.
Goals: Speed up the interpreter, and break the dependency on the CPython runtime.
Let's recap an appendix of the last post. The two resources I remember being most helpful are:
Other resources:
I appreciate more references! This list is biased toward languages that I'm familiar with. For example, I've never used Smalltalk, although it's not statically typed, so I don't expect it to be that similar.
(Update: I just stumbled across this Cheney collector by the Google v8 authors: two_space_heap.cc.)
These data structures are used by both hand-written and generated code. Here's
some code generated from state.py Mem::GetAllVars()
:
Dict<Str*, Str*>* result = nullptr;
StackRoots _roots({&result});
result = Alloc<Dict<Str*, Str*>>();
// ...
result->set(name, str_val->s);
Equivalent Python:
result = {}
result[name] = str_val.s
So gc_heap::Alloc<>
is like new
, except unreferenced objects are
periodically collected. The manual registration of stack roots is OK for our
mostly generated code, but can probably be improved for hand-written bindings.
(I think I backed out of that experiment, and it might be worth another try.)
You can also look at the mycpp-examples benchmark, published with every release. It links to small Python examples and measures the speedup after translation to C++. (Unfortunately it doesn't show the C++ code.)
Here's another view of what we're talking about:
~/git/oilshell/oil$ wc -l mycpp/gc_heap.*
303 mycpp/gc_heap.cc
1403 mycpp/gc_heap.h
1706 total
So it's small! More at Line Counts for Translation.
Direct Links: gc_heap.h and gc_heap.cc
I suppose you can think of any garbage collector as "two graphs overlaid", but I think it's particularly true here. C++ templates and the garbage collector interact in an interesting way.
The data model is simple — it's shaped like statically typed Python!
Str*
List<T>*
Dict<K, V>*
But each of these typed nodes can be viewed as one or more Obj*
instances,
with an 8 byte header:
#define OBJ_HEADER() \
uint8_t heap_tag_; \
uint8_t type_tag_; \
uint16_t field_mask_; \
uint32_t obj_len_;
What are these fields?
List
and Dict
.
Here are the 5 heap tags:
namespace Tag {
const int Forwarded = 1; // For the Cheney algorithm.
const int Global = 3; // Neither copy nor scan.
const int Opaque = 5; // Copy but don't scan. List<int> and Str
const int FixedSize = 7; // Fixed size headers: consult field_mask_
const int Scanned = 9; // Copy AND scan for non-NULL pointers.
}
(The tags are odd numbers because I ran into an issue with vtables, explained below.)
In reading about garbage collectors, one thing I didn't find emphasized is that the GC metadata for Cheney can be smaller than for mark and sweep, especially on 64-bit machines.
For example, CPython has next
and prev
pointers for its sweep phase. This
is 16 bytes alone, compared to our 8 bytes which also includes type info and
field masks for static types.
This per-object overhead is important in all language runtimes, but especially so in Oil, because it has many small records in its fine-grained code representation.
(Update: Aidenn0
mentioned the obvious issue that Cheney has 2x space
overhead, which is very true. Though I still think this difference is worth
mentioning and understanding.)
Str
is a variable-length subtype of Obj
that has no pointers, so it's
Tag::Opaque
.Slab<T>
is also a variable-length subtype of Obj
. It helps with lists
and dicts.
Slab<int>
is Tag::Opaque
Slab<T*>
requires scanning for pointers, so it's Tag::Scanned
.List<T>
is a fixed-size structure, with int
fields ...
Slab<T>
(the items).Dict<K, V>
is also fixed-size with int
fields, with pointers to 3 slabs:
Slab<int>
Slab<K>
Slab<V>
The following types are never instantiated. After examining the heap tag, the
garbage collector does a reinterpret_cast<>
from Obj*
to LayoutX*
for
convenience.
LayoutFixed
- for tracing up to ~16 fields of a user-defined type.LayoutForwarded
- for storing the forwarding pointer that the Cheney algorithm uses.// for Tag::FixedSize
class LayoutFixed : public Obj {
public:
// only entries denoted in field_mask will be valid
Obj* children_[16];
};
// for Tag::Forwarded
class LayoutForwarded : public Obj {
public:
Obj* new_location;
};
So GC graph nodes can be either:
Tag::Scanned
means it has possibly NULL
pointers, while Tag::Opaque
means it contains no pointers.Here are some more facets of the design.
List::append()
and extend()
can realloc. These are literally Python methods.Dict::set(K key, V val)
can realloc and rehash (still TODO)
mydict[key] = val
Instead, tuples are value types:
Tuple2<A, B>
Tuple3<A, B, C>
Tuple4<A, B, C, D>
They're only used for destructuring function return values. For example:
s, num_chars = single_f.GetRaw()
becomes:
Tuple2<Str*, int> tup2 = single_f->GetRaw(); // value, not pointer
s = tup2.at0();
num_chars = tup2.at1();
So "mycpp" is a fairly different language than Python! And again, the generated code is meant to be readable and stepped through in a debugger.
In my experience, the more "exotic" C++ features you use, the more likely it is that the DWARF source location data is confusing or wrong.
In my mind, the generated C++ code is about the performance model, and the Python source is about the application logic.
I mentioned above that the heap tags are odd numbers. This was the result
of a fun debugging session that reminded me that the compiler reserves a
vtable
pointer on objects with virtual
methods :-)
The vtables should be stored at aligned addresses, so if the collector see an
odd number in an Obj*
header, it assumes it's a heap tag. If it sees an
even number, then it advances 4 or 8 bytes past the pointer, and there
should be an odd heap tag there.
TODO: Can we make this portable C++? What about big endian machines? One goal
of Oil is to be portable like Lua (few #ifdefs
), not like Python (many
#ifdefs
). I need help with these things.
As mentioned, I tend to use a modest subset of C++, similar to old versions of Google's style guide.
But the problem of computing field masks seems like a natural application of
constexpr
and offsetof()
in C++.
Our code works, but I had some problems, and ended up using more macros than expected. I don't quite remember what happened, since it was about a year ago, but it's something to make note of.
Our Python classes only use single inheritance. But we also use algebraic data types via Zephyr ASDL, and its code generator also generates C++ classes.
An AST is a big and elaborate data structure, and a problem in ML is that you
sometimes have trivial variants that wrap the same type, adding
indirection. For example, the leaf Token
type can be used as a variant of
both word_t
and expr_t
.
These little wrappers would increase the load on the garbage collector. So I implemented variants as types, which translates to "nominal" multiple inheritance in C++:
Token
IS A word_t
Token
IS A expr_t
That is, there's no field inheritance in the application, but the object header is the 4 C++ fields shown. We wouldn't want those fields to be inherited twice. So that's one reason you see macros instead of inheritance in certain places.
Also, offsetof()
apparently isn't specified well in the presence of
inheritance (?).
femtolisp has heuristics for this that I started to copy, but I ran into problems. This needs more thought.
I want this article to be accurate in the future, so I've moved this section to a wiki page Tasks Under NLnet Grant.
The first pass was Compiler Engineer Job, but now I realize there are many subtasks we could use help on!
To get this done, I see two opposite strategies, with gradations in between:
Why not do this all myself? I touched on this in Oil is Being Implemented "Middle Out". The project is very big: it's more like implementing three languages than one!
So I think this is a fun and small piece of code, and it should lead us to a production quality shell. For some evidence, see Oil's Parser is 160x to 200x Faster Than It Was 2 Years Ago (2020).
Remember that Oil is the most bash-compatible shell by a mile, and has been for several years! It's also a brand new language which I've gotten great feedback on.
Please share this post with compiler and language runtime engineers.
And please sponsor Oil if you can! We now have a small grant, but it may not cover all of this specialized work.
There was some good discussion on the last post:
I expect these questions to come up, and I've made light comments on them in this series:
The real answer to many of these questions is more like "I haven't had time to look into it". So I wrote this post in hopes of parallelizing the project!
(Although "Shell has different implementation constraints than other languages" is also common!)
I started a wiki page: FAQ: Why Not Write Oil in X?