Why Sponsor Oils? | blog | oilshell.org

Narrow Waists Can Be Interior or Exterior: PyObject vs. Unix Files

2023-06-17 (Last updated 2023-06-18)

I want to introduce these terms:

This distinction will help us describe YSH. But first, let's refine the idea of a narrow waist: it can be either interior or exterior.

Table of Contents
Canonical Examples
PyObject* - An Interior Waist
Unix Files - An Exterior Waist
More
Next
Please Send Related Writing
Appendix: Blog Backlog
Why is Python the Language of Machine Learning?
Large Language Models Leverage the Narrow Waist of Text

Canonical Examples

Recall that a narrow waist is an interface or concept that produces M × N amounts of functionality, without M × N amounts of code. See The Internet Was Designed With a Narrow Waist, especially the diagrams.

Let's discuss the two examples in the title.

PyObject* - An Interior Waist

PyObject* is what the entire CPython interpreter is based on. I once described Python's implementation as "500,000 lines of code with 20,000 C functions, which each take and return PyObject", and that's not too far from the truth.

~/src/Python-3.11.4$ fgrep 'PyObject *' **/*.{c,h} | wc -l
30087

It's an opaque C pointer to a dynamic data type. It's the mechanism for "duck typing". We can also look at it as an interior narrow waist between:

Why is it interior? Because code that uses PyObject*, including C extensions like NumPy, runs in the same OS process as the Python interpreter. They're either statically or dynamically linked, but either way it's the same process.

One consequence of this is that, in general, you have to compile C extensions for every version of Python, on every platform. (Recent work on a stable ABI reduces this matrix, but doesn't eliminate it.)

Related: Why Did Python Become the Language of Machine Learning? in the appendix.

Unix Files - An Exterior Waist

In Unix, files have a uniform, unstructured representation. Ken Thompson wrote that this design was explicit, based on experience with earlier systems.

In addition, OS "objects" like disk files, pipes, sockets, and terminals are all represented by opaque file descriptors.

To slightly update the diagram in the narrow waist post, files can be thought of as an exterior bridge between:

Why are unstructured bytes exterior? Because two processes can share a file or communicate over a network.

If the two processes are owned by different people, or deployed separately, then the data that travels over the wire must be stable. Network protocols must be also stable. Otherwise the system won't work.

This is an important difference between interior and exterior waists: the unit of deployment. ELF files or Linux containers are often the unit of deployment in distributed systems.


So you could say that the architecture of both Python and Unix is an instance of M x N Polymorphism, with Python being interior, and Unix exterior.

More

I could have used the user-visible name for PyObject*:

>>> object
<class 'object'>

But viewing CPython from its implementation is arguably more illuminating. If you read C, take a look at Objects/complexobject.c, especially the PyTypeObject at the bottom. It implements complex numbers like 3 + 4j.


The name is misleading: it's not a virtual machine like the Java Virtual Machine. Rather, it's a set of C++ libraries for implementing compilers. The JVM is exterior because it has a stable format, whereas LLVM is interior.

LLVM was the "odd one out" in A Sketch of the Biggest Idea in Architecture, which was mostly about exterior waists and dynamic software composition.

It has buffers, windows, and frames of attributed strings. They form a narrow waist between { M data types ... } × { N operations ... }, and the N operations are highly composable and extensible with Emacs Lisp.

This post describes a very specific benefit:

You can apply the spell checking operation (one of N) to a buffer, which can be a directory listing (one of M).

Why is the Emacs model an interior waist? When you build upon the waist — adding either more data types or more operations — your Emacs Lisp code lives in the same process as the rest of Emacs.

Next

The narrow waists of Unix and PyObject* have both lasted and expanded for decades. And each one relates to the design of our shell. The next post will return to YSH, using the new interior-exterior distinction to describe it.

Please Send Related Writing

As always, I welcome references to existing writing on these #software-architecture topics. I received good feedback last year, but in general I feel this important idea is underexplored.

We should have something along the lines of Extracting the Essential Simplicity of the Internet (CACM 2023) for Unix and Python:

The Internet's design is underappreciated because its beauty is buried beneath an avalanche of implementation details.

This article attempts to extract the Internet's essential simplicity by motivating the fundamental Internet design choices from first principles.

Many of their basic design decisions were in direct conflict with the prevailing wisdom ...

You could say the goal of Oils is to preserve and extract the essential simplicity of the Unix shell, and enhance it with what we've learned in the last 30 years, e.g. from Python, Ruby, Go, etc.

Appendix: Blog Backlog

Why is Python the Language of Machine Learning?

Oils has taught me a lot about the history of Python and its implementation. I used some of that knowledge in this answer:

It relates to the PyObject* narrow waist. I'd like to expand it into a longer post, but here some notes first:

Large Language Models Leverage the Narrow Waist of Text

A Sketch of the Biggest Idea > Bytes And Text Are Essential Narrow Waists emphasized the power of plain text. I responded to a fallacy in A case against text protocols, and quoted Graydon Hoare's Always Bet on Text.

Without adding more hype to large language models, it's pretty clear that

are important operations on text. Their power comes from exactly the same kind of generality we've been talking about with narrow waists: they work on many different human languages, and many different programming languages.

All of those languages are represented as text.


A real-world consequence of this is that manipulating code that's not text becomes less attractive. It doesn't work well with LLMs:

As you might know, in Darklang-classic, you wrote code using a "structured editor".

This is a non-freeform editing experience [that is] no longer [...] important in a world of generated code