Has Facebook outgrown "Move fast and break things"?

October 12, 2021

Facebook’s famous motto, “Move fast and break things” officially left the buliding in 2014, but that hasn’t stopped a large number of people on social media from criticizing Facebook for this old motto in light of recent major outages.

Since Facebook no longer lives by this motto, it’s a bit of a straw-man argument to begin with. But I want to defend this straw man anyway, with a bit of a contrarian view.

The assumption being made by those criticizing the “move fast and break things” motto is that the cost of a 6-hour outage is too high—that too many things broke.

Is this justified?

I don’t know. And neither do you, unless perhaps you work closely with Mark Zuckerberg.

You see, this type of judgement is the result of an ad-hoc ROI calculation. The problem with this armchair quarterbacking is that the public is only privy to one piece of data necessary for that calculation: We know (or can estimate) the cost of a single failure.

In ROI terms, that is part of the investment variable. Let’s simplify with round numbers, since we’re defending a straw man anyway, and say that a single outage costs US$1 billion.

“Oh my stars! That’s too expensive! It’s obviously a bad idea!” some might say (are saying).

If it’s not clear yet, the problem with this is that we have no idea of the return.

If Facebook can lose $1B in, it stands to reason that their earnings potential is also astronomical. What if the fast moving that caused the broken things also earned $50B that would not have been earned by acting more cautiously?

In this light, a $1B “investment” (in the form of an outage) to earn $50B seems like a bargain.

In mature businesses, outages are expected. There’s a failure or outage budget. Management works to keep reliability at expected levels. Not too unreliable, because then business suffers, but importantly: also not too reliable because reliability is expensive.

Jown Allspaw and Paul Hammond make this point in their famous presentation 10+ Deploys Per Day by referencing World of Warcraft’s at-the-time dismal uptime:

If the business requires that the site go down every 2 weeks, even though you’re the largest online gaming platform and you have millions of paying customers, those paying customers might be quite fine for you to have availability of 97%.


Related Content

What can we learn from the Facebook outage?

Facebook has revealed the cause of their 6-hour outage: human error. I hope those pesky humans learned their lesson! Or is there more to it?

How to measure continuous improvement

We can count retrospectives, but that feels like a vanity metric. How do we know if we're actually improving?

Knowledge options

Normally when we think of up-skilling, we think of taking a class. But what if you never use that knowledge? A knowledge option is a tool to reduce this risk.