Playing crash detective

I've been debugging some terse crashes lately.

How long does it take to find a crash? I asked yesterday.

I’ve been debugging some crashes recently.

The most recent one?

invalid memory address or nil pointer dereference

That was the entire log output. Not even a newline.

No stack trace.

Not even any textual context.

Just a garden variety nil pointer dereference error from the Go runtime. The equivalent of a null pointer dereference in some other languages.

The only clue I had was that this crash was being recovered, because the process didn’t exit immediately. So I searched the codebase for all examples of recovered panics. I found a half dozen instances of code that recovered from a panic and simply logged it without context. I added some context.

 func foo() {
   defer func() {
     if r := recover(); r != nil {
-      log.Error(r)
+      log.Errorf("foo recovered panic: %s", r)

With this simple change, I re-ran the code, and got a more meaningful crash report, that allowed me to isolate the failure and begin fixing it.

Next step? Eliminate as many of these panics as possible, and replace them with proper error handling…

Share this