Colin Jones

March 27, 2011

## You are a mad scientist, and you’d like to perform an experiment on your mysterious test subjects. You want half of the subjects to be assigned to one of three experimental groups, and the other half of the subjects will serve as the control group.

The details of the experiments are not important for now, but suffice to say they will not be condoned by the American Psychological Association.

Got a solution already? Okay, okay, we’ll put this in more concrete, mathy terms, but everything after this sentence is really part of a solution, so don’t think you’re getting away with anything here, Doctor Scaryhausen.

Given a collection of length `n`, shuffle the elements randomly, grab the odd ones, and map those to values cycling among `:x`, `:y`, and `:z`. Don’t forget to include the even ones in your result map, mapped to `:c` (for the ”control“ group).

As a diligent student of Clojure’s sequence library, you might come up with a solution like this one:

This works just fine, and as expected, you’ll get a different result every time because of the `shuffle`:

However, it’s pretty hard to tell what’s going on here. The duplication is pretty gross: `(interleave (map last (filter (fn [[i subject]]`, and that contributes to a feeling of just too many responsibilities for one function. It’s hard to read and hard to test, and that kind of thing makes the Wolfman, for one, pretty angry. Let’s clean that up:

Much better. In the process of removing the duplication, we’ve refactored out a function `subject-groups` that has no knowledge of the algorithm we’re using to split up our subjects. That strategy gets injected as the function argument `pred`, for “predicate”. Why not spell the whole word out?

In short, because it’s a convention. In slightly less short, because if you’ve watched Uncle Bob Martin’s recent video on naming, you know that the convention itself is okay because the scope of that binding is pretty small.

The other good thing about splitting these responsibilities out is that now `subject-groups` and `assign-indexed-subjects`, the most complex pieces of this solution, are now referentially transparent.

They are deterministic, so they can now be more easily understood, tested, and even memorized! It’s generally a good idea to partition side effects into small parts of a system in order to get these benefits.

Now, because we have some time left before the sun sets and we’re able to actually perform our evil experiments, let’s look more critically at `subject-groups`.

For an experienced Lisper, this is relatively clear. Just four lines of implementation, not too much cleverness, standard Clojure sequence functions. But what if we represented this function in terms of the way the input `subjects` is transformed into a map of subjects pointing to groups?

If you’ve never seen Clojure’s threading macros (`->` and `->>`), then today’s your lucky day! The idea is that you have a series of transformations, represented by Clojure forms, inside the macro.

And that macro takes the result of one form and inserts it (in some way, depending on which macro) into the next form. As an example, the `->` macro inserts into the second position (the first argument):

Using the `->>` macro inserts into the last position:

Pretty simple, right? One edge case is that a bare symbol, like `count`, will be translated into `(count)` so that inserting into the form is meaningful, but we won’t need that detail here.

So let’s give this a shot with `subject-groups`. First, recall our current implementation:

Now let’s use the `->>` macro:

Well crud. Do you see the problem? We’d like `(interleave (cycle groups))` to be the next form (a transformer form?) to thread our data through, but that’ll give us a map keyed by group!

This is because the previous form always gets stuck in at the end of the next form with the `->>` macro. At this point, I’ve often used a cop-out, bailing out of one threading form and moving on with a different one:

Gah, it’s starting to feel like Clojure is doing evil experiments on us! So now what? Well, it seems we need to switch back to `->>`:

OK, this gets the job done…

But are we really being clearer here than we were with our first version of `subject-groups`, without any threading macros? I’d argue that we’re not, and I think it’d be tough to convince me otherwise.

Thankfully, there does exist a way to hack in the ordering we really want and use one threading macro the whole way down:

This works as perfectly as Frankenstein’s neck bolts! Since we want our transforming data to be the first argument to `interleave`, but the last argument for the remainder of the transforming forms, we’ve just created an anonymous function that’s similar to `interleave`, but swaps the argument order.

Recall here that because we’re using a macro `(->>)`, at runtime the code is nearly identical to our initial `subject-groups` solution: the only difference is this new swapped-argument `interleave`-like function that we’ve created. So we don’t pay a performance cost at runtime for using this version to express the data transformation.

Looking back at our initial `subject-groups` implementation, I admit I’m not convinced that either one is definitively better than the other.

However, I do find the data transformation pattern encouraged by the threading macros to be a compelling way to visualize this process, and in many cases you don’t even need the argument-swapping hack to make it happen.

Pause…

Okay, you’re right, you’re not a mad scientist and therefore the example doesn’t apply to you directly, but the bind it puts you into from a threading macro perspective is very real world. The fact is that when data is being threaded through your forms, the macro depends on the position of the data being consistent throughout the flow.

This problem will also come up (for instance) anytime you’re using `->>` to thread sequential data, and you need to `conj` something onto it, since `conj` takes a sequence as the first argument.

I’m interested to hear how you feel about these two versions of `subject-groups`, and what you might do differently.