The process of learning Ruby has brought several surprises. Some of which are more pleasant than others. The latest I have encountered is that Array#sort
is not stable. For those readers who may not understand what I mean by this, this post starts with a description of what stable sorting is. From there it continues to show why sorting in Ruby is not stable. Lastly I will show a solution to this problem and explain why it works.
What is a stable sort?
Given a list of compound elements, viz. elements which contain elements, we often need different sorts on different elements. We may even need to sort a list without knowing specifically what other sorts were done. A stable sort preserves the order of a previous sort.
Let’s consider an example. Say we have a list of compound elements, each of which contains the elements “First Name” and “Last Name”:
First Name | Last Name |
---|---|
Alex | Jones |
Pat | Smith |
Alex | Smith |
Shawn | Jones |
Dana | Smith |
Terry | Jones |
Suppose we wanted to sort this list using the last name as a primary key and the first name as a secondary key. In other words, all the entries with one last name would come before those with another last name, and for those last names which are the same, the entries would be sorted by first name.
To accomplish this, we would first sort by the secondary key:
First Name | Last Name |
---|---|
Alex | Jones |
Alex | Smith |
Dana | Smith |
Pat | Smith |
Shawn | Jones |
Terry | Jones |
Then we would sort by the primary key. If the second sort is stable, we get the following result:
First Name | Last Name |
---|---|
Alex | Jones |
Shawn | Jones |
Terry | Jones |
Alex | Smith |
Dana | Smith |
Pat | Smith |
However, if the second sort were not stable, the order of the first names would be disturbed, and we might get a result more like this:
First Name | Last Name |
---|---|
Shawn | Jones |
Terry | Jones |
Alex | Jones |
Alex | Smith |
Dana | Smith |
Pat | Smith |
Sorting in Ruby
The Array#sort
method in Ruby uses the venerable Quicksort algorithm. In its best case, Quicksort has time complexity O(n log n), but in cases where the data to be sorted is already ordered, the complexity can grow to O(n2). What we gain in efficiency by using Quicksort, we lose in stability. To make matters worse, some implementations randomize the data before beginning the sort to avoid this worst case for time complexity.
Other sorting algorithms, such as merge sort and bubble sort, are stable. However they are not as efficient. Wikipedia has a good comparison of sorting algorithms.
To see an example of how Array#sort
is unstable, let’s fire up the Ruby REPL and create some data:
We now have a randomized list of six items, each of which contains two elements. If we treat the second element as a secondary key and sort on it, we get:
But then if we take the result of this sort, and sort again on the first element as a primary key, the results may not be what we want:
Stabilizing Ruby Sorts
After a few minutes of Google searching, I ran across a blog post with an extension to the Ruby Array
class:
If, like myself, you are not content with accepting this as more Ruby magic, allow me to explain my understanding of how it works. This code is based on the Enumerable#sort_by
method, and the default comparison operator for Array
.
Enumerable#sort_by
starts by creating a new array. Each element of the array has two elements. The first is the key given to sort the elements of the original array. The second is the element from the original array corresponding to that key. Returning to our earlier array of data, if we wanted to sort by the second element, sort_by
starts by creating an intermediate array:
It then sorts this intermediate array on the first element of each of the new elements:
And extracts the second element of each of the new elements to get the result:
To see how the stable_sort
method works, let’s walk through how it would operate on our array. First, we sort on the second element:
Next we create a new array using the first element of the partially sorted array and our monotonically increasing variable:
Then we sort on the first element of our new array:
The reason this works is because the default comparison operator for Array
uses the first element of the array as the primary key, the second as the secondary key, and so on. When we created this sort, we built the sort key using a monotonically increasing variable, preserving the order of the elements in the original array whose first elements are equal. In other words, we made the sort stable.
The final step is of course to extract the elements in which we are interested:
This leaves us with the order from the first sort preserved. By understanding that Array#sort
may not give the results we expect, how Enumerable#sort_by
works, and how the default Array
comparison operator works, we can get stable sorts out of Ruby.