What did we change?What changed in the last month to give us the confidence to deploy at 5pm on a Friday?
A couple days ago I talked about a Friday deployment gone right, and how it would not have been possible a month earlier.
So what changed in that month?
A few main things:
We, the new team, who are essentially rescuing the project, had learned a lot more about the system. Even with nothing else having changed, this meant we would be a lot more confident debugging it in case of a failure. A month ago, half of the team hadn’t even started yet, and the half that had (which included me) didn’t know enough to debug the system, yet.
We had new end-to-end tests in place to cover some of the most business critical aspects of the system, and they run for every PR, and before every deployment. I’m not a big fan of end-to-end tests, as they’re often slow and flaky. Neither is anyone else on the team. But… they are often the best bang for the buck when getting an untested legacy system under control. We’re also adding fast-running, isolated unit tests, and those are helping a lot, but there’s still a ton of work to do there.
We do frequent deployments. We’re not up to the level of continuous deployment yet (though it’s a goal), but we do deploy several times per week. Which means that any given deployment is relatively small, and low-risk.
We have the ability to roll back quickly. A month ago, a rollback wasn’t even always possible, and when it was, it was a manual process. Now it’s part of the automated deployment process, and takes about 5 minutes.
It’s worth noting that we are not yet at the point that we are fully comfortable deploying at 5pm on a Friday. The only reason we did in that instance was because it was a security issue that we didn’t want to leave unresolved over the weekend.
There’s no shame in not being comfortable to deploy at any moment. It’s only shameful if you use “no Friday deployments” as an excuse not to improve your engineering practices!