Why Sponsor Oils? | blog | oilshell.org

Notes for Houston Functional Programmers Talk

2024-05-14

This post has notes for this event:

I'm publishing it mainly so the audience has something to follow along with.

This material isn't that polished, and I hope that the audience will help me improve it! Ask me questions during the talk, and feel free to send feedback to andy at oilshell.org.

This event won't be recorded, but please invite me to speak to small groups interested in the internals of programming languages and Unix.

Right now my main goal is to attract contributors. You can even be paid to work on Oils!

Table of Contents
Talk Intro
Links
What to take away?
Project Intro
What is Oils? A few slogans
Why work on it?
Project Status
Demo
Audience Survey
OSH - Two complete implementations
YSH - Polished core, more work ahead
The "Middle Out" Style
Questions that motivate the project
A collection of languages to implement
A collection of evolved DSLs, for "compressing" languages
Oils vs. Crafting Interpreters
Regular Languages - with re2c
What is it? / Slogans
Demo: My Favorite Regular Expression
Diversion: In Eggex Syntax
History - Using this abstraction all over
Key idea - Lexer Modes for Interleaved Languages
Algebraic Data Types in Imperative Languages - Zephyr ASDL
What is it? / Slogans
Tidbit: Case Classes / Union Types
Example: Modeling the Structure of Languages
Demo in Python
History: CPython is pretty dynamic, Oils Dynamic → Static
Possible Future Work
Conclusion
A Metaprogrammed Shell
Help Wanted
Links

re2c output for my favorite regex

Diagram: re2c output for recognizing a C-style string with backslash escapes


Note to self: skip some of the sections below! There's a lot of detail, and many opportunities for tangents / diversions.

Talk Intro

This talk has both:

Specifically:

I think these two topics should appeal to functional programmers - reasoning by sets, rather than by states (over time).

I will probably lean toward the latter, since the former is on the blog.

Links

Output from wc -l for first demo:

 54 demo/houston-fp/favorite.re2c.cc

Second demo:

 26 demo/houston-fp/demo.asdl
 26 demo/houston-fp/demo_main.py
 52 total

Automation

197 demo/houston-fp/run.sh

What to take away?

A feeling for what it's like to work on Oils.

Anyone who works on this project will learn some things! I certainly have. (Not in this talk: translating Python to C++, garbage collected runtime, ...)

Tidbits/slogans about regular languages and algebraic data types.

Project Intro

What is Oils? A few slogans

On the home page https://www.oilshell.org/:

Why work on it?

Warning: the project is a weird mix of practical and theoretical.

The project has a lot of metaprogramming for "leverage" -- this has upsides and downsides. Does Oils have the curse of Lisp?, etc.

About me:

Project Status

Briefly:

Demo

Audience Survey

OSH - Two complete implementations

andy@hoover:~/git/oilshell/oil$ time bin/osh ~/git/other/neofetch/neofetch
       _,met$$$$$gg.          andy@hoover
    ,g$$$$$$$$$$$$$$$P.       -----------
  ,g$$P"        """Y$$.".     OS: Debian GNU/Linux 12 (bookworm) x86_64
 ,$$P'              `$$$.     Host: Intel Corporation NUC11PABi5
',$$P       ,ggs.     `$$b:   Kernel: 6.1.0-9-amd64
`d$$'     ,$P"'   .    $$$    Uptime: 58 days, 13 hours, 29 mins
 $$P      d$'     ,    $$P    Packages: 2160 (dpkg)
 $$:      $$.   -    ,d$$'    Shell: bash 5.2.15
 $$;      Y$b._   _,d$P'      Resolution: 3840x2160
 Y$$.    `.`"Y$$$$P"'         DE: GNOME 43.4 (Wayland)
 `$$b      "-.__              Theme: Adwaita [GTK2/3]
  `Y$$                        Icons: Adwaita [GTK2/3]
   `Y$$.                      Terminal: tmux
     `$$b.                    CPU: 11th Gen Intel i5-1135G7 (8) @ 4.200GHz
       `Y$$b.                 GPU: Intel TigerLake-LP GT2 [Iris Xe Graphics]
          `"Y$b._             Memory: 12792MiB / 15639MiB
              `"""


real    0m1.726s
user    0m1.145s
sys     0m0.357s

~/git/languages/mal/impls/bash$ ./stepA_mal.sh  ../../tests/incA.mal
9
~/git/languages/mal/impls/bash$ osh ./stepA_mal.sh  ../../tests/incA.mal
9
~/git/languages/mal/impls/bash$ osh ./stepA_mal.sh  ../../tests/print_argv.mal a 42 'b c d\'
("a" "42" "b c d\\")

~/git/languages/mal/impls/bash$ ./stepA_mal.sh  ../../tests/print_argv.mal a 42 'b c d\'
("a" "42" "b c d\\")

YSH - Polished core, more work ahead

Demos:

Two major pain points gone:

Upgrade path:

shopt --set ysh:upgrade   # breaks surprisingly few things
shopt --set ysh:all       # like bin/ysh, breaks more

shopt --set strict:all    # when you want to run a script against OSH too

The "Middle Out" Style

Questions that motivate the project

(Background for the "middle out" style - Go through this section QUICKLY. Why is Oils big? Why is it taking along time?)

Remember Oils is a mix of practical and theoretical. Scope has always been a problem. Some open questions:

  1. Can we statically parse shell? Yes.
  2. Will an interpreter written against that code representation be compatible with POSIX and bash?
  3. Can we write the interpreter in a high level language?
  4. Can we upgrade bash to a language familiar to JS and Python users? (including the experience of the last 30 years)

Related - https://www.oilshell.org/blog/2021/12/backlog-project.html#oil-is-a-bunch-of-experiments-that-succeeded

The third question turned out to be harder than the first question:

A collection of languages to implement

Many interleaved / mutually recursive languages, many interleaved parser / evaluators.

Oils puts them under the same roof. (Paradox of the project: encourage polyglot programming, but also reduce language cacophony from tiny DSLs.)

Analogy: HTML used to contain Flash code and Java applets, now it contains <video> and WebAssembly

A collection of evolved DSLs, for "compressing" languages

Oil Is Being Implemented "Middle Out" (2022)

A collection of DSLs:

  1. Regular Languages
  2. Zephyr ASDL
  3. pgen2 for the YSH grammar, also borrowed from CPython
  4. Typed Python with mycpp

All these little compilers/translators are in our source tree:


How did I arrive at this? Write the simplest possible code that works, then refactor.

I think of it as "compression" or "vertical factoring". To reduce repetition and gain consistency.

Oils (OSH + YSH + ...) is 50K-60K lines of source code, compared to 140K lines of C for bash.

Nearly all language implementations use at least 1 or 2 internal DSLs (CPython, Go, etc.) But most don't have two complete runnable implementations. (Exception: PyPy)


Thought experiment: implement as many parsers and evaluators as you can, and then refactor the code to be smaller. What do you end up with?

Oils vs. Crafting Interpreters

(Short section for BACKGROUND)

Audience questions:

Comparison:

Regular Languages - with re2c

What is it? / Slogans

re2c is a tool that generates C state machines (switch and goto) from regular expressions. I heard about from Performance of Open Source Applications: Ninja. (Also used in CommonMark reference implementation, PHP, ...)

Demo: My Favorite Regular Expression

My favorite regex is:

"([^"\]|\\.)*"

Many years ago, when reading CPython's tokenize.py module, I was surprised to learn that C-style strings with backslash escapes are regular languages.

(Audience question: Perl-style regexes vs. regular languages?)

Recently: Storing Data in Control Flow (Russ Cox, 2023)

Demos:

What could be improved about the demo:

Diversion: In Eggex Syntax

How about explaining it like this?

osh$ var pat = / DQ ( ![DQ Backslash] | Backslash dot )* DQ /

Aside: in Eggexone level up — it's

osh$ var pat = / '"' ( !['"' r'\'] | r'\' dot )* '"' /

osh$ echo $pat
"([^"\\]|\\.)*"

Matching:

osh$ = r'"foo\n"' => leftMatch(pat)
<Match 0x1c23e>

$ = r'"' => leftMatch(pat)
(Null)   null

Refactored:

osh$ var Backslash = r'\'

osh$ var pat = / '"' ( !['"' Backslash] | Backslash dot )* '"' /

then

osh$ var DQ = r'"'

osh$ var pat = / DQ ( ![DQ Backslash] | Backslash dot )* DQ /

osh$ echo $pat
"([^"\\]|\\.)*"

History - Using this abstraction all over

And evolving the abstraction.

Aside: Line between lexing and parsing isn't obvious: https://github.com/oilshell/oil/wiki/Why-Lexing-and-Parsing-Should-Be-Separate

Key idea - Lexer Modes for Interleaved Languages

Algebraic Data Types in Imperative Languages - Zephyr ASDL

What is it? / Slogans

Aside: re2c is also from the 90's, which postdates both Unix shell (~1970) and the complaints quoted in 2019 above (1993-1994).

Tidbit: Case Classes / Union Types

Can use audience help in explaining this.

pic.twitter.com/GDslesQco1

— ashley williams (@ag_dubs) February 13, 2022

I also "need" this metalanguage feature now, based on experience from implementing many little languages. We want an "executable spec", so it should be short, but:

What's the difference between a textbook implementation and a implementation people use? Contrast with "textbook" Standard ML, e.g. PL Zoo

Technical descriptions:


Other reference points:

Example: Modeling the Structure of Languages

Audience question: What's decl expr stmt ?


Python is roughly expr stmt.

OSH is

YSH is

Examples of free-floating / first class variants:

But not just these dialects. Also error handling, word evaluation, more.

Demo in Python

History: CPython is pretty dynamic, Oils Dynamic → Static

CPython's use of ASDL is pretty dynamic:


Oils was a dynamic program. Again, we started with the simplest possible code.

Now we have "pleasant refactorings" with static types.


But Algebraic data types without static typing is still useful! Illegal states still not representable. (Tangent: Why not OCaml on wiki)

Why dynamic?

Possible Future Work

Conclusion

A Metaprogrammed Shell

We grew the language and the metalanguage at the same time. It can be thought of as:

The "middle out" style is a bunch of custom and evolved DSLs, for code compression:

Memes to remember:

Help Wanted

We need help from people interested in language implementation and design.

Major things "done":

Still TODO:

(I really want to make an distributed OS / computer with a language-centric interface.)

Links