Break glass in case of emergency
Should you block "bad" behaviors, or allow them in case of emergency? Why not both?Over the weekend I had a chat with a friend who asked me to settle a debate between him and a colleague.
One of them felt it was important to make pull request approvals mandatory before merge. The other felt thought it was important to allow merging without approval, in case of an emergency when approval is not possible.
Before you read on, which approach do you prefer, and why? How do you think I weighed in?
You probably guessed, I picked the third, unlisted option. I chose the “break the glass” option.
Let me explain.
Many commercial buildings will have a fire hose positioned behind a glass panel with visible instructions “Break glass in case of fire”.
Breaking the glass serves two purposes. First, it gives you access to the emergency response tools you need immediately; namely a fire hose. Second, it triggers a fire alarm, which will alert anyone else in the building to the danger, and call the local fire department to send a crew who can help with the emergency, in case your fire hose efforts were not sufficient. And by putting the fire hose behind glass, we discourage inappropriate use. If the hose were just sitting there in the open, perhaps someone would use the fire hose to clean the floor, which I imagine we want to discourage.
We can apply the same pattern to a number of software development practices. For our pull request scenario, the idea is to allow an unapproved pull request, but to trigger an alarm of some sort when it happens. This can be done pretty simply by adding a check to your merge pipeline that counts the number of approvals. If that number is 0, an email or slack message can be sent to the team.
This will discourage inappropriate use of unapproved merges, while allowing them in case of emergency. And in either case, the team will be able to do a post-merge review of the code.
The same “break the glass” principle can be applied to many other processes that come up in software delivery as well, where you may want to allow human interaction, but only in case of an emergency. Human access to production servers or databases, rollbacks, service restarts, etc. Find a way to allow these activities if you need them, but alert when they’re done.