Has Facebook outgrown "Move fast and break things"?

October 12, 2021
Did Facebook's recent outages prove that they're moving too fast? I don't know. And neither do you. We don't have all the information.

Facebook’s famous motto, “Move fast and break things” officially left the buliding in 2014, but that hasn’t stopped a large number of people on social media from criticizing Facebook for this old motto in light of recent major outages.

Since Facebook no longer lives by this motto, it’s a bit of a straw-man argument to begin with. But I want to defend this straw man anyway, with a bit of a contrarian view.

The assumption being made by those criticizing the “move fast and break things” motto is that the cost of a 6-hour outage is too high—that too many things broke.

Is this justified?

I don’t know. And neither do you, unless perhaps you work closely with Mark Zuckerberg.

You see, this type of judgement is the result of an ad-hoc ROI calculation. The problem with this armchair quarterbacking is that the public is only privy to one piece of data necessary for that calculation: We know (or can estimate) the cost of a single failure.

In ROI terms, that is part of the investment variable. Let’s simplify with round numbers, since we’re defending a straw man anyway, and say that a single outage costs US$1 billion.

“Oh my stars! That’s too expensive! It’s obviously a bad idea!” some might say (are saying).

If it’s not clear yet, the problem with this is that we have no idea of the return.

If Facebook can lose $1B in, it stands to reason that their earnings potential is also astronomical. What if the fast moving that caused the broken things also earned $50B that would not have been earned by acting more cautiously?

In this light, a $1B “investment” (in the form of an outage) to earn $50B seems like a bargain.

In mature businesses, outages are expected. There’s a failure or outage budget. Management works to keep reliability at expected levels. Not too unreliable, because then business suffers, but importantly: also not too reliable because reliability is expensive.

Jown Allspaw and Paul Hammond make this point in their famous presentation 10+ Deploys Per Day by referencing World of Warcraft’s at-the-time dismal uptime:

If the business requires that the site go down every 2 weeks, even though you’re the largest online gaming platform and you have millions of paying customers, those paying customers might be quite fine for you to have availability of 97%.

Share this

Related Content

We don't want heroics

The hero is the one who stays until 8pm when customers can't log in.

What can we learn from the Facebook outage?

Facebook has revealed the cause of their 6-hour outage: human error. I hope those pesky humans learned their lesson! Or is there more to it?

Adventures in DevOps 120: DevOps Research and Assessment (DORA) Metrics with Dave Mangot

Dave Mangot joins Adventures in DevOps to share how he leverages DORA metrics to improve technology organizations.