Why Sponsor Oils? | blog | oilshell.org
A few weeks ago, I published Metrics for Oil 0.8.4. It establishes a rough performance baseline before enabling garbage collection.
This is a small update to that baseline, released with Oil 0.8.5. I noticed some problems with the benchmarks after partially integrating the garbage collector (which now works on small examples!)
This post doesn't invalidate anything I've said in the past. It just adds some detail!
The mycpp-examples
benchmark shows how much we can speed up small pieces of
code by translating them from staticall-typed Python to C++.
Silly bug: I was building the C++ code with ASAN! When ASAN is on, the compiler generates code that uses "shadow memory" to detect memory unsafety at runtime. This increases the size of every allocation, and makes code slower. Examples:
classes
went from taking 728 ms to 1.9 mslength
went from taking 734 ms to 167 mscartesian
went from 1033 MB of heap usage to 611 MBparse
went from 972 MB of heap usage to 137 MBSo the Python-to-C++ speedups look even more impressive now. (But remember that mycpp is not a general purpose tool.)
The compute
benchmark measures the Oil interpreter vs. bash and Python on
small code examples.
I noticed that bash and Python both used a minimum of 6 MB of virtual memory
(Max RSS
). However, this turned out to be a benchmark bug. We were
using benchmarks/time_.py, a tool written in Python, to measure the
memory of a bash process!
Specifically, we used subprocess.call()
and then resource.getrusage()
.
This doesn't work because Python first forks its larger address space, and
then calls exec()
to start bash.
That is, we measured the memory usage of Python, not bash. This person ran into the same issue:
To fix this, I first changed time_.py
to shell out to /usr/bin/time
, a GNU
utility written in C which has a small address space. Two problems:
What about bash's time
keyword (which Oil implements)?
So I wrote my own benchmarks/time-helper.c. It's surprising to find these basic deficiencies in common tools! I guess I need to build something better into Oil, but that's more work on top of a big pile.
The parsing benchmarks compare $sh -n
across different shells on 10 files:
The 0.8.5 release is the first one where oil-native
is slower than bash!
I believe this is due to the partially-integrated garbage collector. Every
C++ function now has a StackRoots
invocation to register pointers.
This operation should be very cheap, but I would guess that it also inhibits some compiler optimizations. We're passing pointers to locals to be stored in a global (or thread local) data structure.
I mentioned this possibility in the Caveats to January's performance post:
I expect performance to go up and down in future releases, but in the long term it should be faster
I probably won't have time to optimize the mycpp translation of the parser for many months, but it should be possible with enough effort. Remember that Oil is "hilariously unoptimized". (As always, I can use help!)
After fixing these benchmarks, I had a nice experience with builds.sr.ht, the Sourcehut build service. I was driven there by the increasing flakiness of Travis CI.
I want to write a blog post about it, but I should really get back to work on the garbage collector.
Here's a brief outline instead:
services/toil
is a shell script and
web interface that runs on multiple CI services.
ssh
for auth.Let me know if you have questions!