A little over two weeks ago we saw three major companies, the New York Stock Exchange (NYSE), United Airlines and, the Wall Street Journal (WSJ), experience significant outages due to “software glitches”. The WSJ reported that the outage at the NYSE was caused by a software update to a critical system. The U.S. Securities and Exchange Commission (SEC) is still investigating whether the systems were adequately tested. As for United, this wasn’t the first time they experienced problems with back-end software and hardware. During all this, the WSJ’s home page began showing a “504 error”, typically caused by slow IP communication between back-end computers.
I’ve seen “software glitches” that appear to be similar to those described at NYSE and United at other companies and the root cause of the problem is often two-fold – both technical and process. Zeynep Tufekci writes passionately about the problematic ways we architect our solutions and how the technical debt we’re accumulating contributes to the problem. On the whole, I agree that we’re accumulating a burden of technical debt that makes it difficult to write “good” software. Technical debt combined with poor process – in particular light weight testing process (undocumented test scripts, lack of automation, lack of regression testing, low unit test coverage, etc.) – leads to “software glitches”.
So how do you protect your company from being the next headline for a major software outage? Here are six practices you can start today:
- Automate testing when possible; otherwise, thoroughly document to ensure repeatability.
- Incorporate user stories into your backlog to address technical debt.
- Conduct regular code reviews.
- Conduct comprehensive regression testing.
- Create environments to simulate the production environment to allow for accurate and complete testing of critical deployments.
- Have an outside firm conduct a detailed analysis of your software architecture and code to identify potential gaps in quality.
If you are unsure of how to implement the above, West Monroe can help. We have a unique offering we call a Product Delivery Excellence Assessment that goes beyond the analysis of software architecture and code to look at the process as a whole. Understanding the software development life cycle from demand management, prioritization, development, quality assurance and delivery is critical to delivering excellent software.