Why Sponsor Oils? | blog | oilshell.org
This post has notes for this event:
I'm publishing it mainly so the audience has something to follow along with.
This material isn't that polished, and I hope that the audience will help me improve it! Ask me questions during the talk, and feel free to send feedback to andy at oilshell.org
.
This event won't be recorded, but please invite me to speak to small groups interested in the internals of programming languages and Unix.
Right now my main goal is to attract contributors. You can even be paid to work on Oils!
Diagram: re2c output for recognizing a C-style string with backslash escapes
Note to self: skip some of the sections below! There's a lot of detail, and many opportunities for tangents / diversions.
This talk has both:
Specifically:
I think these two topics should appeal to functional programmers - reasoning by sets, rather than by states (over time).
I will probably lean toward the latter, since the former is on the blog.
Output from wc -l
for first demo:
54 demo/houston-fp/favorite.re2c.cc
Second demo:
26 demo/houston-fp/demo.asdl
26 demo/houston-fp/demo_main.py
52 total
Automation
197 demo/houston-fp/run.sh
A feeling for what it's like to work on Oils.
Anyone who works on this project will learn some things! I certainly have. (Not in this talk: translating Python to C++, garbage collected runtime, ...)
Tidbits/slogans about regular languages and algebraic data types.
On the home page https://www.oilshell.org/:
Warning: the project is a weird mix of practical and theoretical.
The project has a lot of metaprogramming for "leverage" -- this has upsides and downsides. Does Oils have the curse of Lisp?, etc.
About me:
Briefly:
osh -n
- totally different internals than bashbin/osh
(Python) and osh-native
andy@hoover:~/git/oilshell/oil$ time bin/osh ~/git/other/neofetch/neofetch
_,met$$$$$gg. andy@hoover
,g$$$$$$$$$$$$$$$P. -----------
,g$$P" """Y$$.". OS: Debian GNU/Linux 12 (bookworm) x86_64
,$$P' `$$$. Host: Intel Corporation NUC11PABi5
',$$P ,ggs. `$$b: Kernel: 6.1.0-9-amd64
`d$$' ,$P"' . $$$ Uptime: 58 days, 13 hours, 29 mins
$$P d$' , $$P Packages: 2160 (dpkg)
$$: $$. - ,d$$' Shell: bash 5.2.15
$$; Y$b._ _,d$P' Resolution: 3840x2160
Y$$. `.`"Y$$$$P"' DE: GNOME 43.4 (Wayland)
`$$b "-.__ Theme: Adwaita [GTK2/3]
`Y$$ Icons: Adwaita [GTK2/3]
`Y$$. Terminal: tmux
`$$b. CPU: 11th Gen Intel i5-1135G7 (8) @ 4.200GHz
`Y$$b. GPU: Intel TigerLake-LP GT2 [Iris Xe Graphics]
`"Y$b._ Memory: 12792MiB / 15639MiB
`"""
real 0m1.726s
user 0m1.145s
sys 0m0.357s
~/git/languages/mal/impls/bash$ ./stepA_mal.sh ../../tests/incA.mal
9
~/git/languages/mal/impls/bash$ osh ./stepA_mal.sh ../../tests/incA.mal
9
~/git/languages/mal/impls/bash$ osh ./stepA_mal.sh ../../tests/print_argv.mal a 42 'b c d\'
("a" "42" "b c d\\")
~/git/languages/mal/impls/bash$ ./stepA_mal.sh ../../tests/print_argv.mal a 42 'b c d\'
("a" "42" "b c d\\")
Demos:
Two major pain points gone:
Upgrade path:
shopt --set ysh:upgrade # breaks surprisingly few things
shopt --set ysh:all # like bin/ysh, breaks more
shopt --set strict:all # when you want to run a script against OSH too
(Background for the "middle out" style - Go through this section QUICKLY. Why is Oils big? Why is it taking along time?)
Remember Oils is a mix of practical and theoretical. Scope has always been a problem. Some open questions:
The third question turned out to be harder than the first question:
Many interleaved / mutually recursive languages, many interleaved parser / evaluators.
Oils puts them under the same roof. (Paradox of the project: encourage polyglot programming, but also reduce language cacophony from tiny DSLs.)
Analogy: HTML used to contain Flash code and Java applets, now it contains <video>
and WebAssembly
Oil Is Being Implemented "Middle Out" (2022)
A collection of DSLs:
All these little compilers/translators are in our source tree:
How did I arrive at this? Write the simplest possible code that works, then refactor.
I think of it as "compression" or "vertical factoring". To reduce repetition and gain consistency.
Oils (OSH + YSH + ...) is 50K-60K lines of source code, compared to 140K lines of C for bash.
Nearly all language implementations use at least 1 or 2 internal DSLs (CPython, Go, etc.) But most don't have two complete runnable implementations. (Exception: PyPy)
Thought experiment: implement as many parsers and evaluators as you can, and then refactor the code to be smaller. What do you end up with?
(Short section for BACKGROUND)
Audience questions:
Comparison:
re2c is a tool that generates C state machines (switch
and goto
) from regular expressions. I heard about from Performance of Open Source Applications: Ninja. (Also used in CommonMark reference implementation, PHP, ...)
$(
vs. $((
etc.My favorite regex is:
"([^"\]|\\.)*"
Many years ago, when reading CPython's tokenize.py
module, I was surprised to learn that C-style strings with backslash escapes are regular languages.
(Audience question: Perl-style regexes vs. regular languages?)
Recently: Storing Data in Control Flow (Russ Cox, 2023)
Demos:
What could be improved about the demo:
How about explaining it like this?
osh$ var pat = / DQ ( ![DQ Backslash] | Backslash dot )* DQ /
Aside: in Eggex — one level up — it's
osh$ var pat = / '"' ( !['"' r'\'] | r'\' dot )* '"' /
osh$ echo $pat
"([^"\\]|\\.)*"
Matching:
osh$ = r'"foo\n"' => leftMatch(pat)
<Match 0x1c23e>
$ = r'"' => leftMatch(pat)
(Null) null
Refactored:
osh$ var Backslash = r'\'
osh$ var pat = / '"' ( !['"' Backslash] | Backslash dot )* '"' /
then
osh$ var DQ = r'"'
osh$ var pat = / DQ ( ![DQ Backslash] | Backslash dot )* DQ /
osh$ echo $pat
"([^"\\]|\\.)*"
And evolving the abstraction.
Aside: Line between lexing and parsing isn't obvious: https://github.com/oilshell/oil/wiki/Why-Lexing-and-Parsing-Should-Be-Separate
Aside: re2c is also from the 90's, which postdates both Unix shell (~1970) and the complaints quoted in 2019 above (1993-1994).
Can use audience help in explaining this.
— ashley williams (@ag_dubs) February 13, 2022
I also "need" this metalanguage feature now, based on experience from implementing many little languages. We want an "executable spec", so it should be short, but:
What's the difference between a textbook implementation and a implementation people use? Contrast with "textbook" Standard ML, e.g. PL Zoo
Technical descriptions:
loc_t
is a static type, Token
is a "struct" with a static type, loc_e.Token
is a dynamic tagOther reference points:
into()
- not the same I thinkAudience question: What's decl expr stmt
?
Python is roughly expr stmt
.
OSH is
command_t
(similar to stmt
)word_part_t word_t
arith_expr_t
bool_expr_t
YSH is
command_t word_part_t word_t
expr_t
re_t
aka EggexExamples of free-floating / first class variants:
But not just these dialects. Also error handling, word evaluation, more.
CPython's use of ASDL is pretty dynamic:
Token < loc_t
PyObject*
Oils was a dynamic program. Again, we started with the simplest possible code.
Dict[K, V]
typeNow we have "pleasant refactorings" with static types.
But Algebraic data types without static typing is still useful! Illegal states still not representable. (Tangent: Why not OCaml on wiki)
Why dynamic?
${}
We grew the language and the metalanguage at the same time. It can be thought of as:
The "middle out" style is a bunch of custom and evolved DSLs, for code compression:
Memes to remember:
We need help from people interested in language implementation and design.
Major things "done":
Still TODO:
(I really want to make an distributed OS / computer with a language-centric interface.)
#blog-ideas
, #language-design
, #performance
, etc.