Automated Testing False Dichotomy #2: All vs None

September 4, 2017

This is the second installment in my series The False Dichotomies of Automated Testing.

If you’ve ever met a recent test convert, you’ve probably heard them talk about the mythical creature that is “100% test coverage.”

As with most benevolent mythical creatures, this one is highly sought after, and possibly even worshiped. It is claimed to have magical powers, although the precise nature of these powers is often hotly debated even among the most ardent of believers. Some believe the mythical beast to be itself invincible. Others believe it will endow those who gaze upon it with omniscience.

Unfortunately, whatever attributes are assigned to this creature, it does not exist in our universe. And this is the cause of much contention. Recent test converts see this claim as an attack on their new-found religion, and test-skeptics see it as an easy way to debunk the whole Testing religion.

But this is one of the False Dichotomies of Automated Testing.

False Dichotomy #2: All vs. None

Test coverage report Test coverage report from Codecov.io for my one of my open-source projects, Kivik.

The war chest for the Mythical Creature Huntsman seeking 100% test coverage contains IDEs that run tests on save and SaaS solutions such as coveralls.io or codecov.io. These tools help the huntsman track down the creature with super sexy reports, showing lines of test coverage, percentage increases, and colorized highlighting of still-uncovered lines of code.

These reports tell you exactly how good your tests are, right?

Wrong!

The Fallacy of 100% Code Coverage

The sad truth is that these shiny reports don’t tell you anything at all about how good your tests are.

What most of these tools report is how many lines of your code are executed during a test. Some may tell you how many statements (in the case of multi-line statements, or multi-statement-lines) are executed, or how many branches (if, switch statements, etc) are executed. But none of these metrics is especially meaningful.

Let’s consider a couple of examples.

First, a trivial bit of Go code:

var foo int

func IncrementFoo() int {
    foo++
    return foo
}

Getting a code coverage tool to report 100% coverage would be trivial here:

func TestIncrementFoo(t *testing.T) {
    foo = 0 // Set the initial value
    result := IncrementFoo()
    if result != 1 {
        t.Errorf("Unexpected result %d", result)
    }
}

But did we cover all execution paths? Obviously not. What if foo = 1 initially? We could add a second test, easy. But what if foo = 1? Add another test case? So let’s use a loop instead:

func TestIncrementFoo(t *testing.T) {
    for i := 0; i < 2147483647; i++ {
        foo = i // Set the initial value
        result := IncrementFoo()
        if result != i+1 {
            t.Errorf("Unexpected result %d", result)
        }
    }
}

Great, now we truly have 100% test coverage. Right?

Well, no. What happens if we loop once more, causing an integer overflow? So we need to add one more case for when foo is incremented beyond 2147483647, the maximum value of an int.

But what if foo is negative? Okay, so let’s rewrite our loop to start at -2147483648 instead of 0.

Now we have true 100% test coverage, right?

Ehm, no.

You see, there’s another function in this program, too. DecrementFoo, which, as the name implies, does exactly the opposite of IncrementFoo. And this program runs in multiple goroutines (or threads) simultaneously. And both DecrementFoo and IncrementFoo might be called simultaneously in different threads. And ResetFoo, MultiplyFoo and IsFooEven may all be called at any time as well.

There are literally an infinite number of possible ways in which IncrementFoo might be called. This means to get true 100% test coverage, you need an infinite number of tests. The mythical beast has escaped.

Corner cases aside

“Yeah, yeah, I get it,” you may say. “True 100% coverage isn’t attainable. But 100% line coverage is still a useful minimum metric.”

My response is another example:

var foo *string

func SetFoo(v string) {
    foo = &v
}

func GetFoo() string {
    return *foo
}

And the accompanying test:

func TestGetSetFoo(t *testing.T) {
    expected := "bar"
    SetFoo(expected)
    result := GetFoo()
    if result != expected {
        t.Errorf("Unexpected result: %s", result)
    }
}

Our test has proven that when we set foo, we can correctly retrieve it again. And we have 100% test coverage, as lines of code covered. But what’s missing?

If we call GetFoo() without first calling SetFoo(), our program will crash. Our mythical beast has now failed us in the worst possible way. It gave us a false assurance. (Also, Santa Claus isn’t real–sorry to burst your bubble twice in one day).

If we agree that a test should, at minimum, ensure the program won’t crash under normal load, the metric of “100% line coverage” comes up sorely wanting.

One more example

“I understand the limitations of 100% line-based coverage, but I still like green lights and shiny things,” the die-hard among us may be saying.

For the sake of completeness, I offer one more example, as a final nail in the coffin of the mythical creature:

// Random returns a randomly-selected index, using the inputs as weights.
func Random(weights []float64) (index int) {
    var total float64
    for _, weight := range weights {
        total = total + weight
    }
    r := rand.Float64() * total
    for i, weight := range weights {
        r -= weight
        if r <= 0 {
            return i
        }
    }
    return 0 // should never happen
}

Pardon the slightly contrived example. The real point here is that the last line of the function return 0, is required by the compiler, but will never be executed. 100% test coverage, even as counted by lines, isn’t even possible in this situation.

One could refactor this into two functions, such that the second function accepts r and weights as arguments; this function could be tested with an r greater than the combined weights, thus achieving a 100% result from a coverage tool. But for what purpose? At this point, we’re contorting our code for vanity metrics. Don’t do that.

So give up on coverage?

The Testing skeptic will now retort that testing is a waste of time, because it can’t possibly cover all cases!

This is an understandable reaction, but it’s a clear overreaction.

The purpose of automated testing is not to ensure perfect software. That’s also a mythical beast.

The purpose of automated testing is to make our lives as developers, and the lives of the consumers of our software, easier. 100% test coverage (by any measure) is not necessary to achieve this goal.

Testing happens anyway…

It’s a given that any software (or at least any software that ever sees an end-user) gets tested. When you visit your favorite lolcats web site, you’re testing that site, whether you intend to or not. Hopefully, the test succeeds, and you see an adorable, funny kitten. But you might get Not Found or an Internal Server Error. This will make you unhappy.

If you frequent an especially robust lolcats web site, they probably have a QA team that tests their software regularly, to ensure that Not Found and Internal Server Errors don’t appear–or at least they don’t appear frequently.

But as we’ve discussed, there are an infinite number of possible lolcats web site states, just in the paging algorithm (remember IncrementFoo from before?). If we can’t expect a computer to test an infinite number of software states, we certainly can’t expect a QA team to test them.

…so lets automate what we can

The goal of automated testing, then, is not to test everything.

The goal is to reduce the negative experience for end users. The goal is to reduce the testing burden on the QA team. The goal is to shorten the feedback cycle between development and problem reporting. In short: The goal is to reduce the cost/benefit ratio for software production.

Both extremes are wrong

100% test coverage is an unachievable goal. You’ll invest countless (infinite, if you’re honest) time and resources chasing this mythical creature, and you’ll never catch it.

0% test coverage is also a fool’s game, both in terms of end-user perception (they’re more likely to see more bugs and fewer lolcats), and in terms of production costs.

The trick is to find some place between 0% and 100% where automated tests produce the maximum benefit for their cost.

If your team has no automated tests today, then start adding some.

If you’re chasing 100% test coverage, then stop.

Finding that middle ground is often more an art than a science. But to get started, a good rule to remember is: Write an automated test the second time you find a function being tested manually.

This means during initial development, if you find yourself clicking “refresh” a second time to see if your lolcat appears, write an automated test.

It means if you receive a bug report from and end user, the first time you find yourself trying to reproduce it counts as a second test–write an automated test.

Rather than focusing on the fact that automated tests can’t test everything, focus on the fact that automated tests are simply faster than human tests. If you find yourself testing the same thing twice… automate it. Not because you’re looking for mythical creatures, but because you’d rather let a computer handle your mundane work for you. And clicking re-fresh 87 times per hours is a lot less fun than letting an automated test alert you to a problem the instant you hit “save” in your IDE.

Of course there are times when writing an automated test is very difficult–especially on legacy software projects. And I’ll talk more about this in a future installment.

Series Index

Share this