A Functional Refactoring in Scala

A Functional Refactoring in Scala

Colin Jones
Colin Jones

June 16, 2009

Many of us have heard a lot of talk about functional programming and its benefits, especially when it comes to highly concurrent applications where thread locking and synchronization would be necessary in order to avoid mangling state among processes.

Languages like Scala, Erlang, and Clojure are increasingly coming to mind as people look for ways to take advantage of multiple-core processors without the headaches threading can create. While I'm not quite ready to take the plunge and completely eliminate state from my programming, I'm looking for more and more ways to get rid of it when possible.

In doing so, I've come across several situations where a simple refactoring to a more functional style can yield rewards in readability and testability, and I'd like to talk about one of them here.

Simple while loops are one of the imperative programmer's most basic tools, and it might seem difficult to eliminate mutable state. However, a little persistence can pay off. Let's start with the following Scala code:

1def printUpTo(limit: Int): Unit =
3 var i = 0
4 while(i <= limit)
5 {
6 println("i = " + i)
7 i += 1
8 }

It's clear that state is changing inside this function, highlighted by the use of Scala's var keyword that specifies a variable that is allowed to change (val variables are unmodifiable).

While this state change isn't visible from the outside (a new copy of the variable is created each time the function is entered), var often means there's unnecessary clutter in the code and that it can be simplified. And indeed it can.

First of all, any programmer worth her salt would first change this to a for loop. A Java implementation might look like this:

1void printUpTo(int limit)
3 for(int i = 0; i <= limit, i++)
4 {
5 System.out.println("i = " + i)
6 }

Scala's for loops look more like Java's foreach syntax:

1def printUpTo(limit: Int): Unit = {
2 for(i <- (0 to limit)) {
3 println("i = " + i)
4 }

In addition to the Java-like iterating structure (left-arrow instead of :); the range specification (0 to limit) is a Scala built-in, but there are also libraries for ranges in Java, Ruby, and plenty of other languages.

The code in the last example seems to more clearly encapsulate the structure of the function. The first and last lines in the while based version of printUpTo are just boilerplate code, and while most any programmer can easily follow the structure of the code, we should always be striving for improvement.

In general, I find that I'm less likely to have bugs with a smaller codebase. Certainly there are exceptions (regular expressions comes to mind), but assuming that two pieces of code both read well, I'd rather have less to read.

In the preceding Scala for loop syntax, i is implicitly a val within the context of the loop, which means it can't be reassigned to a new value. This is a great step forward, as we could initially have created an infinite loop by manipulating the value of i inside the while loop. We have now insulated ourselves against this type of change.

Another functional way to write essentially the same loop would be to use Scala's foreach, a method available on Lists and Ranges that takes a function as a parameter.

1def printUpTo(limit: Int): Unit = {
2 (0 to limit).foreach {
3 i => println("i = " + i)
4 }

This seems more expressive to me, as it emphasizes the importance of the range object in the loop. It would also have given us the ability to use a shorthand if we'd not needed the "i =" text:

1(0 to limit).foreach(println _)

or even…

1(0 to limit).foreach(println)

But there is also something to be said for the familiarity of the more traditional for construct.

The code above is still not completely functional, however, because the println statement depends upon the definition of standard output at the time (imagine redefining it to a java.io.ByteArrayOutputStream, for example). This also makes the function hard to test.

Now, if we're going to build an application to work at the console, at some point we're going to have to actually use println if we want the user to know that the application is working. Does this mean that the functional approach completely eschews input and output streams? Certainly not, but we have two problems in the example at hand:

  1. printUpTo is responsible for both the construction of each member of a group of output strings, and for the actual printing of those strings. Therefore, the function has two reasons to change. For instance, we might want to change the information that gets printed on each line or its formatting. We might also want to change how the output gets to the user.

  2. printUpTo is awkward to test. In order to verify that the correct output comes goes to the user, we'll need to either find a mocking library to change the way the Console object behaves, or redirect standard output to another stream that we can read to verify its contents afterwards (maybe a ByteArrayOutputStream).

I'd prefer a further refactoring in order to alleviate these issues. The problem of verifying console output in our test code won't be completely eliminated, but we'll at least isolate it to its own test. Here, we're extracting println from the string-building part of the code:

1def printUpTo(limit: Int): Unit = {
2 applyUpTo(limit, println)
5def applyUpTo(limit: Int, action: String => Unit): Unit = {
6 (0 to limit).foreach {
7 i => action("i = " + i)
8 }

Now we can test applyUpTo by passing our own custom action in, and tests for println-based behavior can be isolated, so that we isolate some of the necessary mocking and redirection.

As a word of explanation, the action passed in needs to take a String as an argument and return the Unit value: (). Now we can also move printUpTo to a different class if we so desire (assuming this function is part of a larger application), eliminating our need to recompile the code in printUpTo each time the string format changes.

I'd most likely put the applyUpTo call on its own in a main() method if there's really nothing more to the application.

There is another problem with the code above, however: both of the functions are returning Unit values (the equivalent of Java's void). This means that both of these functions rely on side effects to get their jobs done.

If we were to pass a side effect-free function into applyUpTo, we wouldn't actually have any output from the applyUpTo to tell us that we did something. A side effect is any effect of a function beyond its return value, so we can say that if a Scala function has a return type of Unit, there are either side effects, or it is a trivial function that we can safely eliminate.

We can make the helper function side effect-free like so:

1def printUpTo(limit: Int): Unit = {
2 buildUpTo(limit).foreach(println)
5def buildUpTo(limit: Int): Iterable[String] = {
6 (0 to limit).map("i = " + _)

In this case, we're relying on a bit of syntactic sugar for transferring the iterating value of foreach to println, and a placeholder (_) for map's iterating value.

The buildUpTo function is now more functional and more easily testable (we can do a fairly simple comparison with the returned Iterable), but it's important to note that memory usage might be a concern, as a map of all the values could become pretty large.

Another issue with this is speed: in the more functional version, we're iterating through the collection twice rather than just once. We don't often get things for free, and indeed there is a trade-off here between memory/speed and functionality.

The example here has been on a very small scale, and perhaps a bit contrived, but we can extrapolate the ideas to a wide range of refactorings. We want to remove as much code with side effects and mutable state as we can from the rest of our application.

This way we can divide up the functional parts of our application among processors in order to take advantage of the multiple cores that we all have in our machines these days. There are numerous other situations in which functional programming can lead to cleaner code, and there are also places where it might be inappropriate or impossible.

It's important to remember the most common reason we might want to write functional code: concurrency. If I'm only going to be running this small application on my local machine, from the command line, and printing is going to be the limiting factor speed-wise (and it probably will in this case), I would most likely stop at the simplest foreach version above and be done with it.

It will be efficient with memory and fast. But when the function buildUpTo becomes more complex, or we need to run numerous instances of the application, that's where concurrency, and functional programming, are best applied. Our goal should be to find, and use, the right tool for the job.