Code rot, or the poor state of a code base that's more than a few months old: Why does it happen and how can you avoid and fix it?

Software development is hard. It's a bit harder when done well. It's a lot harder when done poorly.

We probably have all seen code rot. Although the bytes in your repository certainly don't decay on their own, the overall quality of the code base declines the larger the code base grows. Eventually you can end up with a pile of legacy code, i.e., code that's hard to use, maintain, and evolve. Such code mostly has problems with the architecture, the tests, and even the dependencies.

How did we get here?

You may have seen statements like these:

It's ok to start like x and then move to y.

This makes a lot of sense. You may not yet know exactly what you're building. You pick one design and work with that. You build a Minimum Viable Product (MVP) and see how that works for your users/customers. Then you adjust as you add more features and get feedback. This is one of the reasons for agile practices: flexibility to adjust. It does become a problem when the move to y never happens and you end up with an architecture that doesn't support the changed requirements anymore.

We'll add tests later.

That makes no sense. Tests show whether the code works as expected. It makes no sense to claim "it works" without testing. (I've been guilty of that, too.) A quick manual end-to-end test is fine, but if you rely on manual testing for regression testing, you won't be able to keep up with it as you add features. In the time you manually test, you could write automated tests. Unless you're already deep in the pit of untestable code of despair.

We're still on Java 8 / Python 2.7 / Perl 4, but we'll upgrade soon.

Even if Java 8 is a long- term support version, and "only" nine years old, unpaid support ended January 2019. Libraries may not support it anymore, and that means you can't get updates or fixes for these dependencies either.

You can confine this with Docker containers, but that one container is still affected.

Why do these good intentions fail?

We know better what we should do, and still so often we actually don't do it. It is easy to forget to add tests later, when the next tasks and deadlines are already queued up — and frankly, it's also hard work. Paraphrased from Steven Covey's "The 7 Habits of Highly Effective People": "I cannot sharpen my saw, I'm too busy cutting trees." The longer you wait, the harder it gets.

This is mostly up to management and product planning: People need to have the slack to innovate and improve. If the team chases one deadline after another, people will cut corners, and there will never be time to go back and clean it up. Overtime was never a sign of good project management.

Don't wait for hackathons to try out cool new stuff. Check out Slack (not the noisy chat app, the Tom DeMarco book). Leave the team room for improvement. Some people refer to this as Orange Friday: People get 20 percent of their work time to spend on toy projects. Legend has it that Gmail evolved from one of these toy projects.

Architecture

"It's bad, we should really change it ... tomorrow." Bad architecture (and bad tooling) is a lot like boiling frog soup. You can get used to it and learn to live with it for way too long.

It may be too much work to refactor, but the longer you let the code grow (or rot) in this architecture, the harder it gets to transition out of it. You'll have to deal with more code. The code is not fresh in your mind anymore. People who were familiar with the code base and its details may have already left the company.

Avoid it: Do constant cleanup. In Extreme Programming you refactor mercilessly. Make this part of each ticket. Refactor before working on that ticket if that makes it easier to do the ticket. Refactor after you have implemented the ticket, to clean up the code, reduce duplication, or move code to a place where it fits better. Value simplicity; don't just add more code, don't copy and paste.

Fix it: Like avoiding, refactor areas of the code you are about to touch. You can even treat refactoring like a feature tax. This makes for a gradual improvement of the code base and you don't have to ask management for a sprint or two where you just clean up and don't deliver new features.

As a bonus: You need good architecture (high cohesion, low coupling) to pull out a microservice, if you ever want to go there. And if you don't want to switch to microservices, it still makes your code base easier to work with.

But wait, doesn't that whole constant refactoring come with the danger of gold-plating, where developers spend way too much time on some small detail? Sure, that can happen. It should be less of an issue when you do pair-programming (I really like Extreme Programming) or when you do code reviews. The team can address this in retrospectives, if someone feels there was gold-plating. There's no fixed solution for everyone, it always depends on the team and the code base.

Tests

Automated tests are like exercise: Everybody knows it's a good idea, but hardly anybody does it consistently.

But it's hard to get tests in if code wasn't written with testing in mind. Then you'd need to refactor to get tests in, but you don't have tests to show that your refactoring didn't break anything. Catch 22. Deadlock. Mocking frameworks can help with this, but they are IMHO a crutch. If code is hard to test, that's a signal that you should probably refactor. Tests are only one user/caller of that code, and production code may also have to jump through hoops to use it. Using a mocking framework is like removing the batteries from the smoke detector so it stops that annoying beeping sound: it may hide the poor code structure by allowing you to simply reach into the details of the code under test and mess with them. It's easy to go overboard with the monkey patching and make the code base worse.

Avoid it: Don't create dependencies in code under test, pass all dependencies in (maybe allow defaults). Use test-driven design (TDD): stay in a test/code/refactor cycle. Don't miss any of these activities. Test-driven design does lead to a better architecture as well as to well-tested code. Whenever possible, strive for 100 percent test coverage.

Fix it: Before changing code without tests, try to write tests. Refactor carefully as to not break anything, and refactor just enough to get tests in, e.g., by passing a dependency instead of having the code under test create it. Use a mocking framework with caution. Once you can test, refactor to do without the mocking framework.

Dependencies

Not upgrading dependencies may seem fine. After all, why bother if everything is working? Never touch a running system, right?

But you won't get any new features that could simplify your life. Dependencies probably will have known vulnerabilities (use tools like OWASP dependency check to find them), and old versions may not get updates anymore.

The problem with updating is that APIs may have changed. Do you use type hints/Typescript/a compiled language with static checks? When it goes beyond a toy project or quick script, the "flexibility" of untyped languages can quickly become a burden. A major update may even break something that a compiler cannot catch. Do you have good tests?

Avoid it: Do updates on a regular basis, for example, once per iteration. Avoid the big bang upgrade to jump a few major versions. Do small incremental updates, but do them all the time. Isolate dependencies behind wrappers or facades to avoid them spreading all through your code base.

Fix it: Like avoiding: catch up on updates, slowly but steadily. Many teams have also had success using component libraries to manage dependencies on the front end.

Summary

You don't starve yourself all week and then spend Sundays in an all-you-can-eat buffet to load up for next week. You don't just shower once a month, but then for three hours.

Take the time for cleanup, maybe a few hours every week, depending on the shape of your code base. Don't wait for the monthly tech debt day (I'm not making this up). In a single day you can only address small things anyway, which should already happen in day-to-day work. You won't get big stuff sorted out in a single day. Having a tech debt day per month sends the (probably wrong) message that you're on top of tech debt and a day per month is enough to deal with it.

Integrate refactorings with the regular tickets. Don't check in code that barely works and has a poor structure, or misses tests or documentation.

What Is Code Rot, and How Can You Avoid and Fix It?

How did we get here?

It's ok to start like x and then move to y.

We'll add tests later.

We're still on Java 8 / Python 2.7 / Perl 4, but we'll upgrade soon.

Why do these good intentions fail?

Architecture

Tests

Dependencies

Summary

Robert Wenner

Empowering Game Developers with Faster Builds and Better Tools

Essential Strategies for Upgrading Microservices Without Downtime

Getting Started with TDD: A Practical Guide to Beginning a Lasting Practice