Bugs, defects, issues, glitches, hidden features—call them what you want, but inevitably we discover that something in our software system is not working as expected. As a responsible Software Crafter you do not tolerate known defects, so you need to find and fix that bug, and fast.
When an issue is affecting customers in production, the pressure is really on. Under this pressure, it’s easy to start flailing around, aiming blindly at the system and hoping to hit the problem with one quick shot after another. A couple of hours later, you are right back where you started, no closer to a solution than you were when you started.
I want to share with you my methodical approach to finding and squashing bugs in a production system. Follow and practice these steps, and you will be smashing those bugs like an Orkin exterminator with a truck full of roach poison.
The first step to fixing any problem is to be able to reproduce it. Bug reports often come in with sketchy details and partial information. In order to fix the problem, you need to know that you fixed the problem, which means that you’ll need to see the problem for yourself.
If you have trouble reproducing the issue, consider going back to the reporter. Ask them more detailed questions about what they were doing when they observed the problem. Ask them to show you. Sometimes bugs are time-sensitive or appear sporadically. These are especially hard to reproduce, but you must keep at it. Grab data from production, alter time-stamps, change your system clock.
Once you have the bug reproduced, start eliminating variables until you get down to the simplest use case that still exhibits the undesired behavior. The scenario where the bug first popped up probably has dozens of details associated with it that are not relevant. The user was doing several things at once, on a particular computer or browser with a particular OS version. As you trim away details from the scenario, you must only eliminate one variable at a time. Keep a checklist and cross things out as you eliminate and isolate. A quick feedback loop will help you try the scenario over and over, taking away one factor at a time until you get the simplest reproduction of the problem possible.
Now that you have a good feedback loop and a minimally complex bug scenario, it’s time to actually find where in the code the problem is. Oftentimes, the effort of reproducing and isolating was enough to zero in on the problem. But if you are still stumped, try some of these bug-finding techniques.
Aha! You found it. You know exactly what is causing the problem. Just fix it and you’re done, right?
Slow down just a second. How do you make sure that it doesn’t happen again? If you haven’t written a test while trying to find the bug, do it now, before you fix the code. This will ensure that you can see a failing test and know that you’ve found and fixed the right thing.
Even if you happen to have already fixed the problem, back out your changes, write a failing test, then reintroduce your fix and verify that the test passes.
No debugging process is complete without a little bit of reflection. Ask yourself, “Why did this bug get introduced?” When you have an answer, ask yourself “Why” about that answer. Do that five times, and you’ll likely find yourself talking about an important problem that can be addressed to improve your team.
Having trouble tracking down a bug on your system? Our team can help you!
What more? Check out the rest of this series: