Managing Technical Debt

An abstract underwater and overwater picture of an iceberg

Brad Ediger
Brad Ediger

April 11, 2024

The metaphor of software as financial debt has gotten a lot of play, for good reason. Debt can be a strategic tool, delivering benefits today by deferring some of the costs until later.

But its effects compound over time: each deferred cost causes more friction on day-to-day development. Like compound interest, this cost can snowball until maintenance becomes the primary activity of an organization, delaying progress and evolution.

DevOps practices can help reverse this trend. By targeting the technical debt that bogs down a team, it creates efficiencies that frees up development teams to focus resources to larger strategic problems.

Addressing technical debt within an existing product requires understanding the problems, quantifying their impact and prioritizing accordingly, designing and implementing improvements, and evolving the development organization away from the practices that allowed the problems to flourish in the first place.

Managing technical debt requires four steps:

  1. Diagnose and quantify technical debt

  2. Prevent additional technical debt

  3. Prioritize and repay technical debt

  4. Measure, repeat, and evolve your approach

Diagnose and Quantify Technical Debt

"Technical debt" is a squishy phrase and means many things to different people. In shorthand, it can stand for poorly designed code, the wrong architecture, unsafe APIs, missing documentation, and more. At worst, it simply becomes a pejorative term for "decisions I disagree with." It usually acts as an observational definition: technical debt is not recognized when it's accrued but it's noticed when it stands in the way of something.

Successfully managing change with such subjective endpoints requires agreeing on a working definition of the change to be made. The words "technical debt" can serve as a rhetorical tool, but they never provide a standalone justification for why a change is good or bad; that must come from the team.

Any team maintaining software already has a thorough mental map of some of its problem areas — the potential improvements determined by the experience of operating and maintaining it. But some deferred maintenance only becomes apparent while trying to make new changes or evolve a working system. Teams should prioritize based on their best understanding of the causes of maintenance costs but know that that understanding is likely to change over time.

When cataloging technical debt, quantify it by connecting it to measurable outcomes. Pay particular attention to the "toil" invested in operations, and be willing to react quickly to new information. Use observability practices (introducing them as necessary) to quantify the impact on development time, product quality, or other outcomes that your work may impact. Work together with product and business stakeholders to frame the problems and solutions in the full economic context of the business.

Remember: Not all technical debt was caused by bad decisions. Even the best decisions can become liabilities over time if the environment changes. Competition or changes in user needs can quickly and dramatically shift an existing product/market fit. A critical dependency may fall out of vendor support or accrue newly discovered security vulnerabilities, turning an asset into a liability. Adopting the principles of blameless retrospectives can help create the psychological safety net needed to make good decisions about the future without casting doubt on the past.

Prevent Technical Debt

Before trying to "repay" technical debt, we recommend taking a comprehensive look at the systems (people, practices, methodologies) that produced the problem in the first place. Thinking about prevention first — "how do we keep this from getting worse?" — keeps you from chasing a moving target. Some recommendations are worth considering in any environment, no matter what the nature of the technical debt you're up against:

  • Comprehensive code reviews

  • Automated testing and CI/CD

  • Infrastructure as Code and other practices that drive reproducible deployment

  • Detailed documentation

  • Continuous education around architecture and coding practices

Shoring up these best practices will help you make the most of more targeted interventions later.

Prioritize and Repay

Start planning how to alleviate the issues you identified earlier using qualitative data. Balance the expected return from improvements against their costs, considering the team's time budget and availability for ongoing development and operational work. Take into account factors such as:

  • Product impact: Does the issue affect customer experience, correctness, quality, or other outcomes that degrade the end-user experience?

  • Developer impact: Does the issue make maintenance more difficult or risky? At the extreme, is the developer experience creating a morale problem?

  • Future-proofing: Would resolving this issue help lay the groundwork for future desired changes?

Once priorities are set, create a roadmap to resolve the issues you identified.

For large changes, we usually recommend deploying fixes in a "ratchet" approach, fixing new standards first before going back and resolving the older, existing artifacts. This will help your team work to high standards on new priorities ("ratcheting up" quality over time), without getting bogged down in fixing everything at once.

Starting with prevention can help you catch issues before they become migraines. By cleaning up the parts of your code that are changed most often, you can make the biggest improvements where they matter most. This helps avoid the "perfect is enemy of the good" phenomenon, the problem of trying to perfect everything before making any changes.

Of course, you are free to prioritize based on what’s most important to you. If known problem areas are consistently pulling your team away from operations and into firefighting, fixing them improves the product and saves you time in the long run.

Measure, Repeat, Evolve Your Approach

Fixing areas of deficiency in a product is an ongoing journey. It’s important to get comfortable with both the process and building systems to learn from it.

To track your progress and justify the time spent in reducing technical debt, use quantitative measures. Did you solve the problems you originally outlined in your business case? Did it yield the expected result? You may be able to directly point to an improvement in development velocity, fewer errors, or better customer satisfaction. These data points will help inform decisions made in the next round of debt service, so it's worth investing in observability tooling that helps gather these numbers systematically.

Aside from numbers, there are human-centered qualitative outputs that matter even if they can't be easily quantified. Large amounts of technical debt can hurt developer morale, but when that burden is lifted, the results can be a bit anticlimactic — perhaps returning to "business as usual." Creating deliberate ways to celebrate success can boost team morale — but it also helps highlight the team's agency to identify and fix their own problems.

Retrospectives can gather the input needed to help you make positive changes and evolve your processes. As you run this cycle repeatedly, you may start to identify patterns that lead to better improvements in your processes.

Adopt a DevOps Mindset

Addressing technical debt requires a strategic approach and a development culture that fosters iterative, scalable change. A DevOps mindset provides teams with unified tools for managing their software and infrastructure, giving them the autonomy and motivation to focus on what matters. In environments of high technical debt, this allows teams to address debt reduction alongside other important and strategic work.

Having trouble making the case for either addressing technical debt or adopting a DevOps mindset? Download Your Strategic Playbook for DevOps Victory to learn how you can advocate for both within your organization.