How Effective Abstractions Are Opinionated

Every software system is built on top of a towering stack of abstractions. It’s an unavoidable arrangement if we ever want to deliver working software. Yet we know abstractions are imperfect representations of reality. How does it all work together? How can we make any progress if we can’t fundamentally trust our tools?

Here we consider the ways that our choices of tools and abstractions can make our job as software developers easier or more difficult. And we look for sound abstractions — those we can trust completely for what we ask of them. Though there aren’t any right answers, there is some good news in the tools we have available.

Some Abstractions Don't Leak
- Effective Abstractions Are Opinionated
How Do Abstractions Leak, and at What Cost?
6 Techniques to Build Sound Abstractions
It's Good to Have Opinions — But Test Them

Some Abstractions Don't Leak

You’ve probably heard the term “leaky abstractions” in software. Popularized by Joel Spolsky 20 years ago in The Law of Leaky Abstractions, the concept has become part of a long conversation about the unintended consequences of abstraction. Spolsky’s law states:

All non-trivial abstractions, to some degree, are leaky. —Joel Spolsky, The Law of Leaky Abstractions (2002)

This framing, and much of the discussion it inspired, focuses on the unforeseen negative consequences of a common tactic: building systems out of abstract components that we mostly treat as sealed boxes. We do this to simplify things: outside the sealed box, we can ignore the details of what’s inside; inside, we ignore what’s outside. But as we build systems this way, the modeling error in each step can accumulate in unpredictable and unintuitive ways. As Spolsky notes — “Abstractions fail. Sometimes a little, sometimes a lot” — the problem isn’t so much that abstraction is bad, it’s that it’s risky: the impact on outcomes can be unpredictable.

The software community has talked about the problems inherent in abstraction for nearly as long as the concept has existed. The pattern was well-established enough in the '70s that Niklaus Wirth, inventor of Pascal, noted the trouble it caused:

In fact, I found a large number of programs perform poorly because of the language’s tendency to hide “what is going on” with the misguided intention of “not bothering the programmer with details”. —Niklaus Wirth, On the Design of Programming Languages (1974)

Wirth and Spolsky don’t have to convince me — this is indeed a problem! But I’d like to call attention to something optimistic that should give us cause for hope:

Some abstractions are completely watertight in the ways that matter.

Depending on your perspective, this may be either intuitively obvious or rather controversial. But I think that the concept of sound abstractions should be familiar, if even at a primitive level. We generally believe without much hesitation that:

The CPU, RAM, and other core components operate with a very low failure rate — low enough to ignore when building on them. We don’t usually write software assuming pervasive faults in the underlying hardware — at least if the hardware is on Earth.
In cases where the consequences of a hardware fault would be greater — such as server memory — we still don’t change the software if we can help it. We swap in a “better” hardware implementation (error-correcting, or ECC, RAM) and ignore it in software.
When we use synchronization primitives such as locks to control how different concurrent parts of a program access shared data, we believe they themselves are built out of simple and explainable uses of atomic CPU operations like compare-and-swap. While we still have a lot to consider in order to use locks properly, we write code around the lock’s API, trusting that it “works.”

If we worried about all that could go wrong at these low levels, we’d never get anything bigger done. So we often ignore the details.

Effective Abstractions Are Opinionated

Useful abstractions capture concepts that are likely to matter in the future. They provide our vocabulary for discussing problems and solutions. During initial design, this is necessarily an opinionated exercise, and it arguably has no right answers. During iterative development, we strive to hold the design with an open hand, continuously reshaping it toward the current needs.

Good engineering practice tries to identify the properties that are likely to be useful in the future course of a project’s development. I often think of this as a materials-science exercise: we do well to study the properties of the materials we build with, and to internalize an intuition for the solutions best suited to each flavor of problem. Plywood is not a good solution for problems that need aluminum, and vice versa.

Many programming paradigms differentiate themselves specifically by the sorts of abstractions they cater to. This means that the selection of the language or environment itself is often an early choice that already expresses an opinion about the types of problems likely to be solved.

Declarative paradigms express the opinion that what matters more than how. SQL, Prolog, AMPL, and Dhall all make this bet. They blur the how in order to focus on the what. The abstractions they provide are aligned with these values.

At a very foundational level, even structured programming itself — the use of procedures, subroutines, iteration, and recursion to structure a program’s control flow — is a level of abstraction that doesn’t exist in the hardware’s execution model.

These language-level abstractions are also usually treated as opaque. When we’re considering whether to extract a function from some code or leave it in line, we may care about the difference for stylistic reasons, we may think about the performance impact if we are convinced it’s relevant — but we would almost never worry that there will be an unanticipated semantic difference between the two. If there is, it’s a bug.

How Do Abstractions Leak, and at What Cost?

Here’s the problem: Abstractions lose detail. Deliberately so. They make it easier to understand a complex domain by sanding off its corners, pretending some of the details just don’t exist. There are a few different ways in which that can hurt.

When we’re lucky, those details might be completely ordinary and regular things we’re happy to forget about while solving bigger problems: allocating and deallocating memory appropriately, managing a database connection pool, packetizing data for a TCP stream. In those cases, we concern ourselves mostly with the nonfunctional requirements such as performance, security, and interoperability. These can be very tricky engineering concerns, and especially at scale they are not trivial to solve.

In the even unluckier cases, the abstractions we use introduce inefficiencies into our functional requirements, or even change them altogether. They may increase complexity by coupling things that wouldn’t otherwise need to know about each other.

And by increasing complexity, abstractions tend to make our systems more difficult to observe and reason about. They introduce new terminology, new components, and indirection. Although we might save cognitive overhead when things are running smoothly, we often find ourselves paying for it when we have to debug “through” the abstraction.

Let’s examine how abstraction leaks hit each of those areas.

Nonfunctional Requirements

Often, when we say an abstraction leaks, it’s because it regresses a nonfunctional requirement such as performance, availability, or security. These nonfunctional requirements are qualities we’re aware of the importance of, but which we may not have direct control of. In search of simplicity, we can let go of them, whether we’re aware or not:

Many early object-relational mappers (ORMs) made it easy to accidentally write N+1 queries: queries that fetch joins between a parent and its N children by first fetching the parent and then individually issuing separate queries for the children, resulting in N+1 database queries when one or two would suffice.

One reason many ORMs encouraged this behavior was because the natural abstraction on the object-oriented side — parents containing language-native collections of their child objects — didn’t offer sufficient granularity to align its usage to the capabilities of the implementation. There’s no way for a Ruby Array to notice it’s being iterated upon with a for loop, and eager-load some child records. Thus, the solution was often to create further abstractions that look like a language’s collection types but which support access patterns aligned with implementation (such as eager loading and efficient counting).
Suppose we have a large engineering organization struggling to support a monolithic application. In pursuit of single responsibility and in an attempt at an Inverse Conway maneuver, we embark to split our monolith into two applications serving separate business units: Sales and Delivery. Our team agrees this is the most natural and effective division of responsibilities and will lead to more maintainable code.

(Most monolith decompositions have to spend a lot of time identifying and introducing useful abstractions in order to create seams. Although it’s certainly possible to build a well-abstracted monolith, the well-abstracted ones aren’t ordinarily the ones targeted for breakup.)

Sadly, the release causes notable friction. Increased surface area of the extraction drives higher cloud usage costs. 99th-percentile response time has increased due to increased network usage, and failure rates to the customer are elevated.

We think we can remedy the issues, and in the long term we still need the architectural agility this transformation will afford us. So we want to push forward. But how do we measure our success? And how do we defend the process to the executive committee?
Cryptographic implementations that use the wrong set of mathematical abstractions can leak information via side channels (timing, power, emissions, and more). Security is tricky: security properties can be completely undermined by pesky things like CPU cycle timing that may not have even been contemplated when the problem was specified.

When we increase abstraction, we add nonlinearities between cause and effect. The more indirect the connection from things we can change to things we can observe, the more opportunities for error in trying to control one with the other.

Complicating the picture, nonfunctional properties don’t generally compose:

A fast application server connected to a fast database is certainly not a guarantee of a fast user experience.
Putting high-availability database clusters in New York and London does not guarantee a high-availability database. (One of the many reasons is Segal’s law: “A person with one watch knows what time it is; a person with two watches is never sure.”)
Data privacy is a nonfunctional requirement that can be almost anti-compositional: aggregating disparate systems tends to strictly increase their privacy risk. Because information can be disclosed from the combination of two datasets which couldn’t be learned from either alone, the combination of two systems can be a larger privacy risk than either system in isolation.

Most nonfunctional requirements are emergent properties that can only truly be evaluated in the context of the system as built. Thus it can be extremely costly to build a system atop the wrong abstraction. Once bitten a few times by these systemic unknowns, we’re justified in being a bit skittish!

In his 1974 article referenced above, Niklaus Wirth stressed the importance of aligning a language’s abstractions with the underlying environment, using the term “transparence” for the ability of a high-level programmer to reason about the performance impacts of their code:

Transparence is particularly vital with respect to storage allocation and access technique, since storage access is such a frequent operation that any unanticipated, hidden complexity can have disastrous effects on the performance of a whole program. —Niklaus Wirth, On the Design of Programming Languages (1974)

Nearly 50 years on, this still feels relevant, and if anything, our abstractions have enabled us to recognize the same patterns across many new areas of study. What Wirth saw while thinking about storage access patterns in programming languages, we can hear echoes of in issues that arise in distributed systems: thundering herds, cache coherence, deadlock and livelock, priority inversion, Byzantine agreement. All of these patterns involve fractures in the complex interplay across multiple levels of abstraction, in service of presenting a simpler model of a complex system. They are all subject to the same disconnections between causes and effects that Wirth cautions against.

Functional Requirements

Remember: Effective abstractions are opinionated. They take a position on how to carve up the domain, with all the attendant benefits and drawbacks of taking opinionated technical positions.

This might have been great if there were one value system: one stakeholder, one use for the abstraction, one downstream client. But that’s never the case, and so our abstractions end up having to compromise on their fit to the domain, introducing slippage between the functional requirements and the capabilities of the abstraction. It shows up in different ways.

This slippage may look like we’re using the “wrong tool for the job”: the abstraction was the wrong functional fit for the problem to be solved. Usually we discover this when we try to iterate a working system to do something slightly different, and the model breaks down.

Text encoding is a place I see this sort of slippage a lot. One of the most common types of operations in computing — interpreting series of bytes as characters and vice versa — involves so much nuance that O’Reilly alone has published thousands of pages on the topic. Mistaking one encoding for another, or mistaking an arbitrary byte stream for a valid UTF-8 string, is a common source of errors and even security vulnerabilities.

There are very robust ways to drive toward correctness in encodings using strongly typed languages. When I work in languages that force me to think more explicitly about the encoding of my data, I find it easier to find and fix related bugs (“oh, this interface is providing us Latin-1 data, not UTF-8.”) However, not every use case needs the full correctness (with the attendant overhead) that these methods provide. As a practical matter, if data is always stored and interchanged in UTF-8, spending time outside that design space is wasting our effort. Using the wrong level of modeling (too much or too little) wastes time.

When an abstraction ends up being both the wrong tool and something we “invented here”, it’s often premature abstraction: we built something that worked well in its context but fails to generalize. It might be possible to incrementally adapt our understanding to fit the new circumstances. But it’s also possible that the new context requires a paradigm shift, and anchoring to the premature abstraction could hold us back from discovering the right solution.

Enterprise “second systems” can result from premature abstraction:

A team realizes they have built a specific instance of an Enterprise Pattern: a rule engine, a workflow system, a state machine, an e-commerce site, a report generator.
The second system attempts to genericize the implementation to the Platonic ideal of the Enterprise Pattern.
In doing so, the team has added a degree of freedom to the problem: rather than create a state machine for a specific set of use cases, they now have the possibility to engineer a state machine for all conceivable use cases.

Whether a team in this situation can successfully restrain themselves depends on the context and the team — but with the additional design space this freedom affords, it’s easy to lose sight of the needs of the business.

Here’s a trickier one. Shared abstractions (shared between systems, between teams, or even across time) are incredibly prone to this sort of functional requirement mismatch. Users of a shared abstraction will have diverse uses for it, driven by different motivations and goals. In the spirit of “a good compromise leaves everyone unhappy,” everyone involved will find some places where the shared model doesn’t quite fit.

The so-called “object-relational impedance mismatch” exemplifies this pattern. The differences between how object-oriented languages and relational algebra choose to model data causes friction. Because those two models of the world are completely different, there’s not one obvious and correct way to map between the two realms. There are many choices to make in modeling, each of which sacrifices something. But despite its limitations, object-relational mapping remains a very popular way to share data between object-oriented and relational systems.

Finally, we sometimes see leaks of implementation details upward into functional requirements. As an example, there are several ways to implement an associative array (“map” / “hash” / “dictionary”). The two mainstream approaches are hash tables, which require hashable data types; and search trees, which require elements to be sortable so the tree can maintain its invariants.

Generally, the high-level API is the same: it’s the abstract interface shared by any associative array. But the two strategies impose different requirements on the lower-level representation of the data. The lower-level implementation details (“can we find the right total order on these IEEE 754 floating-point values?”) can become a problem further up the stack when interwoven with higher-level questions (“are hashed or ordered collections better suited for this problem?”).

Observability

There’s a more insidious way abstractions can leak: by obscuring visibility. The abstraction presents the illusion of a simplified domain to its user, but the illusion breaks down when its assumptions are violated.

By hiding detail, abstraction can turn an obvious fault into a less obvious one. This blunts the connection between cause and effect, making debugging harder. If you’ve ever had to troubleshoot any moderately complex caching scheme (say, HTTP), you’ve probably felt this pain.

Most abstractions simplify their domain by throwing away information. Some of the information they discard may be useful for debugging, which almost always depends on exposing some internal state not needed in successful operation. Some abstractions which throw away information now accrue debugging functionality — when configured in “verbose mode,” they may change their logging behavior to include information useful for debugging. Others may couple directly to a logging system’s interface, reporting everything and letting the logger determine what matters. Either way, to support debugging in situ, the abstraction must be aware of the distinction and often must participate in it or information would be lost.

Loss of fidelity is one issue, and it’s one that can be mitigated with effort. A bigger problem is that abstractions add complexity to the debugging process: just “seeing through them” can be difficult. If the problem is in a layer below the abstraction, debugging through it is usually harder than debugging without it might have been. If the problem is with the abstraction itself, it could point to a fundamental misunderstanding of the domain or just an implementation error. Determining which regime you’re in can involve significant effort. The abstraction doesn’t always assist in this effort — but it always adds more potential surface area to the investigation.

I thought you mentioned optimism?

I’m getting there. Understanding the risks is important because the rule is actually right: all abstractions have a cost. Abstraction is indirection — even “zero-cost” abstractions have increased complexity, and thus increase cognitive overhead.

There is a crack in everything; that’s how the light gets in. —Leonard Cohen, “Anthem”

So how do we get a foothold? Given the ways abstractions can leak, what should we do?

To start, we can recognize that we often want to prioritize functional requirements over nonfunctional ones. (As the old performance optimization joke goes, “If it doesn’t have to work, I can make it as fast as you want.”) While nonfunctional requirements such as security can be equally important in many domains, I claim there’s still a hierarchy: even security often takes a back burner if a system’s ability to meet its functional requirements is in question.

So when I design abstractions, functional requirements are front and center. Semantics comes first. And we actually have a great number of tools to help drive toward abstractions that are semantically sound.

6 Techniques to Build Sound Abstractions

OK, here’s the optimism. Just as we’ve spent decades learning about the failures of abstraction, we’ve spent decades building tools to get ourselves out of the pit. We have a spectrum of options available for software correctness, at many different levels depending on the need.

Admittedly, “what we have” to use is a moving target, so there are no right answers that will stand the test of time. The techniques of software verification and validation are continuously evolving. The tools that we have to drive correctness using abstraction are far ahead of those available 20 years ago. Don’t be surprised if the methods that feel best-in-class today are superseded in a decade.

That said, the tools we have now offer a tremendous amount of modeling flexibility, which we should use when it suits us. So what do we have?

1. Formal Methods

At the far end of the continuum is “formal methods,” broadly defined: proof-carrying code, model checking, proof assistants, and automated theorem provers. These techniques can produce code that is ✨guaranteed✨ to be in compliance with its specification. (To the extent that anyone can guarantee anything about software, that is.)

Creating verified software involves thinking of and proving properties at every scale: from small localized assertions about a function all the way to global properties that involve the behavior of an entire system. Properties may be things like:

This list being passed to binary_search is properly sorted.
This memory allocator never issues overlapping memory to different requests at the same time.
This program never attempts to free the same pointer twice, no matter what input it is given.
Every incoming ping request is acknowledged with a reply within one iteration of the main loop.
This compiler optimization pass does not change the observable behavior of the program being optimized, for any input.

Formal methods give categorically greater confidence in properties like these than testing can. But they come at an incredible cost to development velocity and often without a recognizable payoff that connects to the organization’s mission. So it’s not a surprise that they haven’t seen widespread adoption among developers who don’t have specific reasons to engage. They see the most use in life-critical systems: avionics, medical devices, industrial control, and the like — where the cost of verification can be justified against the cost of failure.

CompCert, a verified optimizing C compiler, is one of the largest public examples of formally verified software. CompCert compiles C into assembly, just like any C compiler; it contains optimizing passes, like most C compilers. But it is the first C compiler with provably correct semantics: using independent definitions of “what C means” and “what x86 assembly means,” CompCert translates one into the other with perfect fidelity.

Miscompilation does not exist; the proof has pushed the room for error into the specifications at either end. The “compilation” abstraction does not leak.

However, the effort was immense: the team’s experience report claims over 100,000 lines of code developed over more than six person-years. It produced many academic publications, including several Ph.D. theses.

Despite limited market penetration, formal methods are becoming more accessible every year. They provide a powerful validation that the techniques can work if the economics are aligned. So we can steal things from formal methods and apply them in more limited contexts without going all the way.

I’m a fan of this pragmatic “less-formal methods” approach. Let’s explore some less-formal techniques that can still provide improved safety.

2. Thoughtful API Design

Modern software is so complex that we have little hope of demonstrating correctness from first principles. The obvious retort is that we don’t actually have to show anything “from first principles” — we always work atop some level of abstraction. Even those who would write directly in machine code still depend on CPU microcode and silicon, which are themselves nontrivial abstractions.

Every proof, every bit of verification, every demonstration of correctness exists relative to some model. Every implementation sits atop something, and our ability to build coherent and correct systems depends on our confidence in the behavior of the abstractions we use.

This implies that we should depend on interfaces that have predictable behavior. Our ability to successfully climb a tower of abstraction, while preserving the things we care about, depends on our reasoning about the materials that we build with and the systems atop which they sit.

In some cases, the abstraction can align with the domain’s needs and solve the problem for us. Constant-time cryptographic programming is an example. As mentioned above, a naïve implementation of some cryptographic operations using a CPU’s mathematics primitives will leak secret information to an attacker who is able to accurately measure the time that things take. But there is an abstraction — constant-time programming — that mitigates for exactly that weakness by ensuring that the API it exposes is safe (the time each operation takes does not depend in any way on secret data). Building atop a correct implementation of this interface, you don’t have to worry about the timing side-channel safety of these operations and can spend more time focusing on higher-level correctness.

The Resource Acquisition Is Initialization (RAII) pattern found in C++ is a similar example. An object instance corresponds one-to-one with a handle to an external resource such as a file, database handle, or socket. You don’t get the object back from the constructor until the resource is successfully allocated, and when the object is destroyed it releases the resource. Thus, if you’re confident that the abstraction was built correctly, you can code as if the objects were the external resources themselves without caring too much about the mismatch between the two.

3. Algebraic Data Types / Functional Domain Modeling

Sometimes we can encode the correctness we care about in the shape of the data itself. Algebraic data types (ADTs), as seen in Scala, Haskell, Rust, Swift, and other languages, are a particularly useful tool in modeling and encoding understanding.

As a very simple example, consider a function that lets us search for products either by name or by their location in the warehouse. In Rust, such a search might be modeled with an ADT like this enum:

enum ProductSearch {
				/// Identify products by full-text search on the name.
				ByName(String),

				/// Identify a product by its warehouse location.
				ByLocation {
								aisle: u8,
								shelf: u8,
								bin: u8
				},
}

fn search(req: ProductSearch) -> Vec<SearchResult> {
				/* ... */
}

The key advantage — and where ADTs get their “algebraic” name — is that they may contain both product types (the fields of a ByLocation are colocated like a struct) and sum types (ByName and ByLocation are two different variants, either of which constructs a ProductSearch). It’s hard to overstate how much modeling power you get just from these two concepts in a type system, when you can name and nest them arbitrarily.

The biggest practical advantage of ADTs is their representational fidelity. ADTs let us build types that are precisely the right shape to contain the data being represented. A ProductSearch is either a ByName with a search string, or a ByLocation with the exact warehouse coordinates needed.

Consider how we may have written this if we didn’t use enums with variant data:


struct BinLocation {
				aisle: u8,
				shelf: u8,
				bin: u8
}

fn search(name: Option<String>, location: Option<BinLocation>) -> Vec<SearchResult> {
				/* ... */
}

Even though we still have some type safety from the Options and from the BinLocation struct, we now have to deal with two possibilities (name and location both None or both Some) that we don’t really want to deal with. We probably want them to be errors, because that’s not the right way to do a search? But in the ProductSearch example, nobody has to think about that case because it’s impossible to construct.

We can keep narrowing, too. If we get itchy about 255 being a valid aisle number, we can narrow that u8 to a type that has fewer inhabitants. And that would be a good time to have a discussion with the business about what an Aisle really is, and how it relates to the processes at hand.

The fact that this can prompt so many conversations about “what is this thing, really, to the business?” is a sign that it’s an effective modeling tool. If you’ve heard a functional programmer say “make illegal states unrepresentable,” this is what they mean.

For further reading on this approach, Matt Parsons discusses tradeoffs in API design in Type Safety Back and Forth. Alexis King further expands on the connection between type safety and validity in Parse, don’t validate. Amos’s article Abstracting Away Correctness is a long journey, but I recommend the read if you’re interested in getting a feel for some of these ideas worked in detail. And Scott Wlaschin’s book Domain Modeling Made Functional weaves these themes together with an approach that explicitly incorporates domain-driven design, with realistic F#-based examples in a business-app context.

4. Property-Based Testing

Property-based testing is a useful technique for testing complex systems by coming up with simple properties you believe should always be true of your implementation. It’s implemented in test frameworks like Python’s hypothesis, Haskell’s QuickCheck, and Rust’s proptest.

I respect property testing because it takes a fundamentally lazy approach.

Just as in formal verification, property testing starts out with specification. We’re trying to check exactly the same properties here we would be looking at in a formal proof: “the list is sorted” or “the program never double-frees a pointer” or “every outstanding request is handled before the next frame is rendered.”

But at some point we realize that we’re going to have to prove all this stuff we specified. And surveying the massive amount of work involved in that path can motivate some incredibly creative work-avoidance solutions. Property testing flips the burden back on the computer: rather than prove we’re right, make the machine prove us wrong.

The test runner generates random input data, feeds it through the system under test, and asserts that the property is satisfied. If it ever finds a counterexample — a double-free, an unbalanced red-black tree, any assertion failure — it reports the inputs it provided that triggered the failure.

Some common categories of useful properties I’ve seen:

Representational invariants: In a binary search tree implementation, all operations must leave the tree correctly sorted (every node’s value sorts between those of its left subtree and its right subtree). Property testing can encode this property and try to find operations for which it doesn’t hold.
Encoding fidelity: When types are convertible to and from a type like JSON, we often want to assert that randomly generated objects round-trip faithfully through the representation. Property testing can make this basically a one-liner.
```
it "round-trips any Person through JSON" .
		property $ \(person :: Person) ->
				decodeJSON (encodeJSON person) == Right person
```
Statistical model checking: Write the “reference model” first: the simple, obviously correct, naïve, slow implementation. Test it thoroughly so you know it’s working.

Then, as you grow more complex and nuanced implementations of the same interface, use property testing to generate valid calls to your interface. Shove the same sequences of calls into the reference implementation and the smart one, and assert that they result in the same output. (This can be done by trying to check equivalence on the two implementations themselves — but it’s usually easier to just incorporate queries into the sequences of operations being generated in the first place.)

In this way, you can grow implementations with arbitrary amounts of internal complexity as may be needed to offer performance — making time/space tradeoffs, memoizing, taking advantage of some out-of-band knowledge of the problem structure or distribution of inputs. But at each step, you can work with rough confidence in the behavioral fidelity of the complex implementation.
Business logic: An advantage of pure functional domain modeling in business logic is that it plays well with property testing of high-level business requirements. Property-based testing has helped me avoid bugs in complex areas of logic:
- Verifying that emailed reports are only sourced from data that the recipient is authorized to see, even as the access rules evolve over time.
- Checking that supply-chain algorithms generate results that make physical sense. (You can’t ship four units from a warehouse that only has two; you can’t cross-dock 15 units through a warehouse if they arrive and depart in break packs of eight.)
- Stress-testing heuristic solutions to optimization problems, to ensure they’re not too sensitive to the real-world conditions we’ve seen so far.
Due to the complexity of the systems above, we’d never be able to formally prove these properties; it wouldn’t be worth the investment. But through judicious use of property testing, we bought just the confidence we needed.

Properties themselves can look a lot like runtime assertions that you might sprinkle in your implementation anyway — they make an assertion about some invariant over the state of the system — but property testing makes it a lot nicer.

One big advantage is the ability to use property testing to drive the exploration process itself. Some things we want from our software are best expressed as properties. Instead of writing unit tests to carefully exercise a specific example, sometimes it can be more helpful to use property testing to validate or refute big hypotheses about how the software currently works. This empirical effort helps support the specification process (what the software should be) with targeted examples and counterexamples of what the software currently does. It doesn’t eliminate the thinking from the problem, but it can often focus our thinking on the right part of the problem.

Another advantage of property testing is detailed control over the generation of input distributions. This can be more important than it seems at first blush: taking uncorrelated uniform or normal samples as your input data isn’t likely to exercise every path you care about. So advanced property testing libraries have combinators to describe fairness conditions like “Between 30 and 70 percent of the test orders generated should be routed down the quick-fulfillment pathway.” This won’t tell you how to accomplish that by tweaking the inputs — but it will warn you when the test data you’re generating isn’t sufficiently diverse to test everything you care about.

One of the most profound advantages of property testing is tools that support shrinking ¹. Without shrinking, the results of property testing would look awful: randomly generated data is not the most intuitive place from which to debug. So shrinking does something that’s either genius or obvious in retrospect, maybe both: it tries to reproduce the same failure with slightly smaller (shrunk) input data. It might remove an element from a list, make some number into a smaller number, and other similar small changes, trying to keep the test failing in the same way. When it can’t make further progress, it reports the “minimized” failure. With the right shrinking strategy, this can be an effective way to find small counterexamples.

The final thing I’ll say about property testing is that it’s a tool to clarify thinking. I’m not sure I mean this as a compliment. It’s a fiercely adversarial partner who will say “that name field takes a byte string, oh? How about "\"\\\000"?" I’ve heard it said that property testers spend half their debugging time thinking about the correctness of the solution (“am I actually 8-bit clean here?”) and the other half debugging the specification (“oops, -3 is an integer. Is it a valid user ID? If not, should this function signature change?”). That 50-50 breakdown feels frustratingly correct, but it’s a benign sort of frustration — in either case, the outcome moves the specification closer to the implementation’s behavior.

5. Starting With Observability

During agile development, as developers’ understanding of requirements co-evolves with the business, most systems go through stages where their abstractions are ill-fitting, mismatched to the current needs, buggy, premature, or just incorrect. We often want to be able to try ideas without committing to them. This suggests that our abstractions should be as observable as possible while code around them is still in flux: reconstructing their role in a process after the fact should be straightforward and transparent.

Debugging is more difficult through an abstraction — when things are going wrong, you want to minimize the layers between you and the problem. Observability is easiest to build in alongside the abstractions you’re building, while the context is fresh.

It can pay to go a bit overboard with observability from the start, because you don’t necessarily know what’s going to be important ahead of time. In the same way that I set up assertions as guardrails around my assumptions about a problem or a solution, increasing observability can increase our confidence in the correctness and performance of a system even if it doesn’t change anything.

6. “Zero-Cost” Abstraction

Rust programmers, myself included, are fond of speaking about zero-cost abstractions. In a sense these are the best abstractions we could hope for: they don’t cost anything when not in use, and there’s no marginal cost to using them — you couldn’t make it run any faster by avoiding the abstraction.

The idea isn’t original to Rust, but its philosophy has influenced Rust’s design in profound ways. The prototypical example is Rust’s memory model. Rust generates code which acquires and releases memory at just the right time — not by dynamically allocating and deallocating, but by statically understanding code’s memory needs and generating static allocations. This required not just building a very smart compiler but also developing, implementing, teaching, and struggling with key new language abstractions (“borrowing,” “lifetimes”).

And thus we see that even zero-cost abstractions can have big consequences. The Rust compiler is more complex than it would need to be if it didn’t do such memory bookkeeping for the developer. Its error messages can be confusing, as they require understanding concepts that don’t exist in many similar languages.

And perhaps I buried the lede a bit here. The most fundamental problem is that zero-cost abstractions are hard to design. We can’t find them for every situation that wants them. And they may not be as ergonomic to use as costly abstractions. In the same zero-cost abstractions article, boats specifically nods to Rust’s preference for zero-cost abstractions as a potential factor holding back adoption of the language. So we should celebrate things like this when we find them, but perhaps they shouldn’t be the only arrow in the quiver.

It's Good to Have Opinions — But Test Them

There’s this no-go theorem in optimization called the no free lunch theorem. The basic idea is that you can’t abstractly solve search or optimization problems without the cost function — what it is that you’re searching or optimizing for. In other words: An optimization that benefits a subset of possible worlds is also a pessimization for a different subset of possible worlds. Making a bet that the world works in the way you think is also taking a risk that it doesn’t.

No-free-lunch implies that abstractions can only achieve zero-cost with reference to some set of assumptions. If those assumptions aren’t perfectly aligned with the environment we’re working in, the abstraction is costly. Abstractions are opinionated about the world they live in.

More generally: Everything has a cost. We’ll always experience tension between the things we value and what it takes to get there. Some costs are only obvious in hindsight, after systems come together. But every action we take has consequences, and “all abstractions are leaky” reminds us to plan for those consequences. Abstraction adds complexity, and complexity is costly — so prefer abstractions that ultimately help drive better design, correctness, performance, or whatever it is you seek at a product level.

One way we plan for the cost of abstraction is to practice watching it early, by starting with observability. Practice building architectures where you have transparency into the state and history of the metrics you care about, and have advance warning when they’re getting out of hand.

Finally, as Sandi Metz says, “duplication is far cheaper than the wrong abstraction.” As she further explains, though we often think of this prospectively (“should I write this new abstraction?”), we should think of it in hindsight as well (“have the abstractions we use outlived their purpose?”) to avoid letting the sunk-cost fallacy pin us to poor decisions. Re-evaluate continuously to ensure the systems you’ve built are serving your needs.

Shoutout to folks using exhaustiveness-based checkers too, we see you! You can skip this paragraph while laughing at us about our need to shrink our test cases. ↩