How expensive is a bug?
The "observation" that fixing a bug in production costs 1000x more than fixing it during design assumes a waterfall approach.If you’ve been working in the IT industry for any significant time, there’s a high probability that you’ve heard claims like “It costs 1,000 times more to fix a bug in production than during the design phase”. Such claims are often accompanied by charts or tables that look something like this:
Phase | Repair cost |
---|---|
Design | $100 |
Development | $1,000 |
Testing | $10,000 |
Production | $100,000 |
That’s enough to make you want to try really really hard to design the right thing right, isn’t it?
Now, every version you find will have different multipliers, and maybe even different stages, but the particulars aren’t the point I’m talking about today (the particulars are apparently fiction anyway).
What I want to point out about this entire train of thought is that it’s based on an outdated premise. That premise is called, in a word “waterfall”.
So what if we apply this principle to proper agile software development practices, and teams using CI/CD and the like?
Well, the first thing we’d notice when trying to apply this table to a proper agile team is that there are no designated “design”, “development”, and “testing” stages. These are all activities that happen continuously, in an intermingled way.
But let’s brush that aside for a moment, and agree to try to apply the concept in a more conceptual way.
On the most agile teams I’ve worked on, it usually takes no more than about 20 to get a bug fix into production, or if the bug fix is taking too long, a rollback to stop the hemmoraging while the bug is investigated. But let’s be generous, for the sake of calculation, and assume that it takes a full hour to remedy a production bug. And that the engineer doing this (working alone) earns US$200,000 per year. That means the cost to fix this production bug is aproximately US$103.
So here’s the thing: When you operate in an agile way, with short feedback loops, the point at which you detect a bug becomes almost immaterial to the cost of fixing the bug. Every new line of code spends less than 8 hours in queue (that’s the definition of Continuous Integration) before being deployed to production. In this working style, Design, Dev, Test, and Production are all burred into each other.
When fixing a production issue takes 20 minutes, the relative dollars-and-cents cost of the stage in which we fix a bug is meaningless. Who cares if it costs 1000 times practically-nothing?
Of course, there are reasons to detect bugs as early in the process as possible, and I’m sure I’ll talk about them in the future, but they’re more aligned with the idea of optimizing flow through the system, not in reducing the bug fix cost.