Designing for Pipeline Transformations

Designing for Pipeline Transformations

Damon Kelley
Damon Kelley

December 07, 2016

posts/2016-12-07-designing-for-pipeline-transformations/pipeline-transformation-social.jpg

Pipeline transformations can be a wonderful mechanism for creating self-describing code. Many languages offer a built-in pipe operator. The list includes Elixir, F#, Clojure1, Hack, and shell languages.

The purpose of the pipe operator is to send the output of the left side in as input to the right side. This can make for a more eloquent way of communicating this expression.

posts/2016-12-07-designing-for-pipeline-transformations/left-to-right.svg
Left-to-right

Below is the inverse flow of data. This is the direction data flows in an equivalent expression without the pipe operator.

posts/2016-12-07-designing-for-pipeline-transformations/right-to-left.svg
Right-to-left

This slugify function will convert a raw string into a series of sanitized keywords delimited by dashes. This version, written in Python, is an example of the right-to-left data flow.


def slugify(text):
				lower_case(delimit_with_dashes(strip_punctuation(text)))

If we pass in Hello World! as input to this function, the flow of data would look like this.

posts/2016-12-07-designing-for-pipeline-transformations/slugify-right-to-left.svg

Compare this to the version using a pipe operator.

(defn slugify [text]
		(-> text strip-punctuation delimit-with-dashes lower-case))

And with Hello World! passed into the piped version of slugify.

posts/2016-12-07-designing-for-pipeline-transformations/slugify-left-to-right.svg

We can also reorient the function body vertically.

(defn slugify [text]
		(-> text
						strip-punctuation
						delimit-with-dashes
						lower-case))

It is likely that you find that the piped version is much more expressive and easier to read. The example above is written in Clojure, but as mentioned earlier, other languages also provide a built-in pipe operator. Below are equivalent expressions in Elixir and Bash.

Elixir

def slugify(text) do
		text
		|> strip_punctuation
		|> delimit_with_dashes
		|> lower_case
end

Bash

function slugify() {
				echo $1 \
				| strip_punctuation \
				| delimit_with_dashes \
				| lower_case
}

Pipelines also prevent the need to create temporary variables for intermediate transformations. Below is another way we could have implemented the original slug function using temporary variables.

def slugify(text):
				text_without_punctuation = strip_punctuation(text)
				text_with_dashes = strip_punctuation(text_without_punctuation)
				return lower_case(text_with_dashes)

Not only does this version require us to think up a meaningful variable name for each step of the transformation, but it also is just as difficult to read as the original version.

The pipeline operator enables us to write code that is self-descriptive. We can simply let the function names describe the transformations without the need for naming the intermediate representations.

Okay, great! I've sold you on the pipe operator. Now how do we design our APIs in a way that we can use the pipe operator to build out these pipeline transformations?

Below are a handful of things to consider when designing for pipeline transformations.

Transformations are Made of Transformations

First, keep in mind that each function in a pipeline is itself a transformation. Any one of these functions may contain their own pipeline transformation. This one is pretty simple, but it seems important enough to state.

What Kind of Transformation is Needed?

Essentially, there are two types of transformations.

A → A'

This kind of transformation occurs when the incoming data closely resembles the outgoing data. The slugify example is an instance of an A → A' transformation. We take some text and make a series of alterations to it. On the other side, we still have a string, just a slightly different string.

(defn slugify [text]
		(-> text ;; A
						strip-punctuation ;; A'
						delimit-with-dashes ;; A'
						lower-case)) ;; A'

A → B

This transformation occurs when the incoming data does not resemble the outgoing data. Let's look at an example.

(defn side-by-side [new-text original-text]
	{:original original-text
	:new new-text})

(defn slugify [text]
	(-> text ;; A
	strip-punctuation ;; A'
	delimit-with-dashes ;; A'
	lower-case ;; A'
	(side-by-side text)) ;; B

A string is what went in, but a map was returned. Now that we have a map, any subsequent transformation needs to use a different class of function to operate on this new data structure.

Identify the Data under Transformation

We need to know what data we are operating on. If we are in a controller of a web app, this might be some sort of a request struct. If we are inside the business logic of that web app, then it might be a domain model. Perhaps we are processing user input, and we are transforming a string.

As simple as it might seem, it is important to make this identification. This will inform our decision about the signature of the functions that will go in our pipeline.

Stick to a Single Argument

A single argument makes it easier to pipe data through functions. If our functions require multiple arguments, it takes more work to wire them up. Furthermore, extra arguments can take away from the expressiveness of the pipeline. It adds extra information for the reader to process. These extra arguments also have a tendency to creep into collaborating functions.

Yet, sometimes we need more than one piece of data in our transformation. In this scenario, we should attempt to unify our arguments.

Unify Common Arguments

There are times when our pipelines will need more than one argument. In these cases, consider collecting them into a map or struct.

Take a look at the pipeline in the app function. The collaborating function present needs foo and bar. Consequently, we have to pass both of these into app, even though only one of the functions uses bar.

(defn present [foo bar]
	(println str foo " " bar))

(defn app [foo bar]
	(-> foo
	important-operation
	(present bar))

(app foo bar)

We can deal with this by unifying all the arguments inside a map. In turn, each transformation now operates on a single data structure.

(defn present [app]
		(let [foo (:foo app)
								bar (:bar app)]
				(println str foo " " bar)))

(defn app [config]
		(-> app
						important-operation
						present))

(app {:foo foo :bar bar})

Take notice of the reduced noise inside the pipeline. This is a simple example, but imagine how complicated things could get if we needed to pass in many arguments.

This is not a silver bullet solution though. We did not eliminate any complexity, we only moved it. By simplifying app, we had to move some complexity down into present. This is a potential trade-off.

Facade Functions

Collecting all the arguments in a map is not always the right solution. Sometimes our transformation functions will need need a one-off argument. For example, perhaps a string operation needs a regular expression.

(defn slugify [text]
		(-> text
						(strip-punctuation #"!")
						delimit-with-dashes
						lower-case))

Here, we can create a facade function instead.

A facade function is just a wrapper around another function. It encapsulates how the function is invoked, and with what arguments it is invoked.

(defn strip-punctuation-all-but-! [text]
		(strip-punctuation text #"!"))

(defn slugify [text]
		(-> text
						strip-all-punctuation-but-!
						delimit-with-dashes
						lower-case))

Facade functions provide the opportunity to give the operation a more meaningful name.

Of course, scenarios like this are a judgement call. Sometimes I find that a facade function does not add enough benefit to warrant a new function.

Argument placement - or, Avoiding signature mismatch

When we do need to pass in multiple arguments to one of our transformations, we should be deliberate in choosing the argument position and order.

Carelessness when choosing the order of our parameters can make it difficult to stick in a pipeline.

(defn append-username [username slug]
		(str slug " " username))

(defn slugify [text username]
		(-> text
						strip-punctuation
						(#(append-username username %))
						delimit-with-dashes
						lower-case))

When we avoid this signature mismatch, we have to do less work to fit the transformation in the pipeline.

(defn append-username [slug username]
		(str slug " " username))

(defn slugify [text username]
		(-> text
						strip-punctuation
						(append-username username)
						delimit-with-dashes
						lower-case))

Clojurists will point out that the -> and as-> macro exist exactly for this design shortcoming. These macros allow you to pipe the data through to the last argument, or whichever argument is specified.

Most languages do not offer this flexibility. They either pipe data through to the first argument or the last argument.

Conclusion

Once we have deemed that a pipeline transformation is the right tool for the scenario, we need to take care to design our functions in a way that they can work together. Identifying what data needs to be transformed and the nature of the transformation is an important first step. This will inform decisions later on about the signatures of the collaborating functions. If possible, limit it to a single argument, which might mean wrapping multiple arguments up in a data structure.

One might say that the pipe operator is really just a mechanism for presenting code. Changing the direction the data flows can have a deep impact on the readability and promote self-description of a piece of code.

However, the pipe operator also encourages a design that promotes a single data structure. This lines up well with the advice of Alan Perlis.

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

Keep these considerations in mind the next time you are designing a pipeline transformation with the pipe operator.


1. In Clojure, this operator is referred to as the "thread macro", because it "threads" the argument through the list of functions. This is some unfortunate naming. It has nothing to do with those other threads.