As a developer there are an infinite number of things to remember: everything from intricacies of the language you’re using to the domain knowledge of your current project. Just about everything can be readily looked up, but if you look up every little thing you’ll never get anything done (and likely annoy your pair). A good working knowledge of common functions in your language’s standard library is usually expected. But is everything in even the “standard” library easy to remember?
Have the creators and maintainers of that library made choices that make it easy for you, the developer using it, to avoid bugs? I want to take a look at the choices that were made in designing a simple, common standard library function: finding the index of a substring in a string. What is a developer expected to remember in order to properly use this function? And how easy is it for them to forget and introduce a bug?
Let’s start with Ruby. In Ruby the
String#index method returns the 0-based
index of the substring if it is found, or
nil if it is not found.  Returning
nil for the negative case is common
in Ruby and makes sense in some contexts (for example it is “falsy” in boolean
checks). In many contexts though it is just a special value that means “not
found.” The function could just as easily return a scream-cat-emoji with no loss
Having this special value puts the responsibility on the developer to remember
that the method could return
nil and that case likely has to be handled
differently than when the substring is found. A search for the index that isn’t
paired with a
nil check is likely a bug that is waiting to happen, but nothing
other than you remembering that will keep you from doing it. Sure, you should
probably have a test for that case, but that just means you have to remember to
write that test case, so we’re back where we started.
There are some who would argue that the problem with the behavior above is
caused by Ruby’s dynamic type system. That
nil is a different type than an
integer and thus has to be handled differently. In this mindset it makes no
sense to return these different things from a single function, and static typing
would have helped the developer catch such a bug. So let’s take a look at the
same function in a popular statically typed language, Java.
In Java, the
String#indexOf function returns the 0-based index of the
substring if it is found (as in Ruby), but returns -1 if the substring is not
found. Here the type of the return value is the same either way, but -1 is still
just a special value meaning “not found.” The only thing special about -1 is
that it is less than 0 (so is -2 but I guess -1 is easier to remember). We’ve
index.nil? check for an
index == -1 check, but the type system is
doing little to help us avoid such bugs.
Really what I’m talking about here is being able to look at my code and easily
know what it is doing and that it is doing it correctly, without bugs. Some
would call this “reasoning about” their code, and assert that functional
languages give them greater power to perform this type of analysis. Let’s take a
look at Clojure, which is a functional language with a dynamic type system. The
traditional way to call the
String#indexOf function in Clojure is just
(.indexOf “string” “substring”), which is actually just calling the same Java
string method discussed above (which should make Java developers more
comfortable). However, with the release of Clojure 1.8 the string module now has
index-of function that behaves the same as the
indexOf function in Ruby
(which is sure to trip up both Java and Clojure developers alike). Neither one
of these help me reason about my code or do anything to help me avoid writing
Trying again, let’s take a look at Purescript, which is a statically typed
functional language similar to Haskell. 
Purescript does have an
indexOf function, and it has a return type of
Int. This encodes in the type signature that the substring may not be found.
The code will not even compile if the return value is used in a way that does
not clearly handle the cases of
Just <some int> when the substring is found
Nothing when it is not. The developer is of course free to take those
values and use them or ignore them, but they could not forget to handle the
Nothing case in some way.
There is nothing keeping the implementation of this function in all other statically typed languages from behaving the same way. Making this function return an integer rather than some optional type is a decision made by the creators of that standard library. In doing so, they’ve added one more thing to the ever-growing list of things someone using their language must remember, and one more place to introduce a bug if they forget.
Try to keep this in mind when designing a library. How much is someone using your library expected to remember? If they forget, how easy is it for them to catch their mistake?
 That is, of course, if nobody has hacked open the String class and redefined it. This may sound like a joke to some of you, but ask around to any Ruby developers you know who have been on legacy projects. When someone gets the thousand-yard stare instead of laughing, you’ll know what I’m talking about.
 I originally wanted to use Haskell here, but Googling for “Haskell string indexOf” doesn’t turn up much. The most promising link is a post on Quora asking “How do I find the index of a substring in Haskell?” The first answer just asks the original poster if perhaps they are looking for the Knuth-Morris-Pratt algorithm, which finds the answer in O(n+m) rather than the naive O(n^2). The second answer suggests that the original poster probably doesn’t need this function at all and should start thinking in terms of higher level functions. It turns out strings in Haskell are really just lists of characters. Why they stopped there when characters are just numbers and numbers are just 1s and 0s is left unanswered. The point relevant to this post is no such function exists in the standard library and the user is left to write their own. I’d suggest taking a look at the Knuth-Morris-Pratt algorithm, which finds the answer in O(n+m) rather than the naive O(n^2).