The Core of Erlang

I've spent the last few months implementing a compiler that targets the Erlang Virtual Machine (BEAM). BEAM code itself seemed like a tricky initial target, so I looked into the intermediate forms that Erlang code takes during compilation. Core Erlang seemed like the right fit for my project, and it provided a very pleasant experience. This post will outline why Core Erlang is a neat language and how to sidestep some of the shady parts.

Compilers historically

Not too long ago, it was infeasible for a computer to represent a file of code in memory, so all compilers were what we now call "single-pass" compilers. In the simplest terms, a compiler would read a statement of your source code, then translate that into assembly (or some other target language), then go read another statement in your source code, and translate that, and so on. This strategy limits the sort of optimizations that a compiler can do and the semantics that the source language can support. As memory became more accessible, compiler design could become more sophisticated. The "multi-pass" compiler is one that models your source code as some intermediate form in memory, then performs some number of checks, optimizations, and simplifications before outputting the target language.

Core Erlang is the major intermediate form that powers the Erlang (and Elixir) multi-pass compiler. Sasha Fonseca illustrated this pipeline on Elixir Forum, so I've reproduced that diagram below.

Erlang Intermediate Forms

Why it's cool

1. Easily externalized

Unlike intermediate forms in some other languages, the Erlang compiler can read and write Core files without any hack-ery. erlc +to_core your_module.erl will parse your modules, expand macros, then write the Core file as my_module.core. Asking the compiler to finish the job is as simple as compiling a normal Erlang module: erlc my_module.core, which leaves you with a my_module.beam file.

2. Human readable

BEAM files are binary encoded, but Core Erlang files are plain text!

% ERLANG

factorial(0) -> 1;
factorial(N) -> N * factorial(N-1).

% CORE

'factorial'/1 =
	fun (_cor0) ->
			case _cor0 of
					<0> when 'true' -> 1
					<N> when 'true' ->
							let <_cor1> = call 'erlang':'-' (N, 1)
							in
									let <_cor2> = apply 'factorial'/1 (_cor1)
									in
											call 'erlang':'*' (N, _cor2)
			end

If you know how to write Erlang, then you know how to read Core, which leads me to my favorite feature.

3. Simplicity

Core Erlang is a full language, with all the power of Erlang and Elixir, but the language itself has very few elements. This means that Core does not support some of the syntax that you would find in Erlang. The above snippet demonstrates some of those reductions:

Pattern matching is only allowed in case statements
All function calls, including built-in's, are qualified with their module
Function calls to external modules are syntactically different from calls to the same module (apply)
All same-module calls are explicitly referenced arities
All case statements include guard statements

These restrictions make Core very annoying to write by hand, but the strict syntax makes it very nice to generate programmatically! Since all language features are explicit, I find that Core is simple to spot-check for syntax errors.

The shady parts

This is the most recent Core Erlang specification I managed to find—yes, 2004. The cerl module seems to be undocumented on erldocs.com after OTP 16, so the most reliable way to explore is reading its source code.

Additionally, error messaging when compiling bad core files can be less than helpful. One recurring pain point was the subtle departure from Erlang semantics. For example, Erlang allows us to nest expressions arbitrarily, like using arithmetic inside of lists and function arguments. Core is a bit more specific since it internally differentiates between literals (things like variables, numbers, lists, and tuples) and expressions (things like function calls or case statements). Literals compose like you would expect, but some expressions can only contain literals, not other expressions! This is why the snippet above binds _cor1 and _cor2 as temporary variables before using them immediately. If you ignore this distinction, the compiler will bail and print its own stacktrace, not that of the core module you are compiling.

Recommended resources

Despite these challenges, I have enjoyed working with Core Erlang. Peeking inside the Erlang compiler helped me to understand some of the neat tricks it uses to provide a nice, expressive language. Though textual documentation is hard to come by, the following talks are helpful outlines of the compiler's design:

If this post whet your appetite for compiler design, Stephen Diehl has a nice blog series describing the intermediate forms used during Haskell compilation (one of which is also called Core).

Behind every erlc and mix compile lie a number of Core modules that power the whole process. Exploring intermediate forms help us better understand how our tools are helping us. Since Core Erlang is externalize-able, human-readable, and simple, it provides a nice on-ramp for doing just that.

The Core of Erlang

Compilers historically

Why it's cool

1. Easily externalized

2. Human readable

3. Simplicity

The shady parts

Recommended resources

Kofi Gumbs

Empowering Game Developers with Faster Builds and Better Tools

Essential Strategies for Upgrading Microservices Without Downtime

Getting Started with TDD: A Practical Guide to Beginning a Lasting Practice