We have all heard of debt, typically referred to in the financial sense such as loans and credit cards, but do you know what it means to have technical debt? This is a concept that has increasingly entered into the vocabulary of IT leadership and business investors. This is however something that’s not often discussed by technologists that are so close to the platforms that have accumulated so much technical debt.
When thinking about technical debt, I envision the technical parts of past projects which need to be fulfilled before the project can be considered truly completed but always seem to go unfulfilled. From very esthetic examples such as lines of source code without proper indention or network interfaces lacking proper descriptions, to more impactful examples such un-optimized yet technically functional subroutines or network routing paths, each represents a form and variable amount of technical debt.
As the analogy goes, just like with financial debt there are often completely legitimate reasons for incurring technical debt and too much or “bad” technical debt can lead to very real and very negative impact. Some of the biggest reasons you can incur technical debt come from:
- Deadlines: Issues arise during projects and looming deadlines often force us to push through to delivery by prioritizing required functionality over what would be considered best practice.
- Real-time Issue Resolution: During issue resolution, changes are often made by administrators under duress to return systems to a functioning state but they lack the time to make all changes confirm to configuration norms such as naming schemas.
- Human Error/Judgment: It is not uncommon that technical debt is incurred simply as the result of sloppiness or laziness.
From my perspective as a network engineer, there are many examples of how technical debt can turn a regular day into a very long day or night. Something as technically benign as missing or inaccurate interface descriptions can easily lead to extended outages because it forces one to look elsewhere for information (e.g. documentation that may or may not exist, old emails, even old colleagues!). These descriptions are not necessary to achieve correct network functionality at the time, but can have an impact on the future that far outweighs the effort to get it right the first time.
Another example that I come across frequently are complex and odd NAT configurations and routing policies often used to facilitate connectivity during network migrations. These are likely to remain after completing the migration because the network is technically functional, understanding the configuration well enough to clean up the unrequired parts is difficult and time consuming, and often for fear that removal of any configuration might unexpectedly break functionality (and this fear is typically warranted). When issues eventually occur (and they will), the presence of this extraneous configuration will at best create noise and distraction during triage and at worst hinder diagnostic efforts with unexpected behavior, or may even be the direct cause of the issue(s).
The point I am trying to make is that technical debt has very real business impact (i.e. lost revenue due to extended outages caused by unnecessarily complex configuration) and there is very real value in investing effort in paying down your existing technical debt such as:
- Reduction of management complexity and required effort
- Reduction in Mean Time To Resolution (MTTR) during outages
- Improved system performance and return on investment
The other point I want to make is that paying down technical debt should not be an “every now and again” project where you wait until too much debt has been accrued then attempt to pay it down in bulk just to let it accumulate again. Continuously paying down technical debt needs to become a part of your IT culture. Technical debt needs to be actively monitored just as you would your financial debt and paid down as it is accrued to insure too much is never built up. It may not be the most exciting work you have ever performed nor pay the quickest dividends, but at some point in the future it could be the difference between a few minutes of downtime and a few hours of downtime. This equates to a real return on investment for any company.