The two big dangers of rolling back

Blog / Software Delivery

Broken code is inevitable. Despite our best planning, automated tests, code review, and everything else, sooner or later (probably sooner) you’ll find that you’ve deployed a broken feature into production, and users of the system are complaining.

Our natural instinct in such a case is to quickly jump back to the last working version—a process we colloquially call a “rollback”.

Today I offer two reasons not to do this. Tomorrow I’ll talk about an alternative.

Diverging from the established software release cyle is risky.

On a healthy team, releasing software changes into production should be second nature. It should be as natural, and automatic, as breathing, and it should require just as much thought on a daily basis. I’m describing (and assuming) that this team is using continuous deployment.

On such a team, a rollback to a previous version of software on production servers is a divergence from normal. This isn’t always a problem in practice, of course, but it’s a potential problem area. When you roll back a service version, do you need to roll back any other dependencies, too? Are there any steps that are normally automated, that now need to be performed manually? Do dependent services need to be restarted? Etc.

In theory, of course, a rollback process could be automated as completely as a normal release, to aleviate this concern. And if you find yourself depending on rollbacks, you definitely should do this!

A rollback blocks other progress

The other serious problem when doing a rollback is often overlooked: It blocks all other development work from progressing. Of course, here I’m assuing the use of real continuous integration.

To illustrate, let’s imagine a rollback scenario:

Service version 1.2.4 is released to production. A seroius bug is discovered.
The service is rolled back to version 1.2.3 while the bug is investigated and fixed.
Meanwhile, all updates to the service are put on hold, lest version 1.2.5 be released before the bug is fixed.
Eventually the bug fix is applied, and 1.2.5 is released. Now any backlogged work can be committed, and 1.2.6, 1.2.7, etc, are likely to be following quickly.

Even on a small project owned by a single team, this can be a problem. On a large project, this can effectively put an entire department’s work on hold for the duration of the bug fix. And then once the bug is fixed, you get a slew of upgrades, effectively negating the reduced risk benefit that CD can offer.

Related Content

Roll with the changes

In my dreams, every new feature I develop is amazing. Customers love it, and they receive great value from its use. Yesterday I explained why rolling back is dangerous in these situations. Today I present you with my preferred alternative: Rolling forward The idea is fairly simple. When a critical failure occurs, rather than rolling back to a previously-built version of the software, we revert the broken change in git (or other VCS), then build and deploy a new version of the software.

Well, some good came from it

After a failed monster deployment, we're finally switching to continuous deployment.

Think unnaturally

We humans take a lot of natural ideas with us from the material world, to the software world where they aren't context appropriate.

The two big dangers of rolling back

Related Content

Roll with the changes

Well, some good came from it

Think unnaturally

Improve your software delivery