6 ways to RUIN your logging

May 30, 2020

Edited by Mitchell de Rijcke

I’m knee-deep in some log refactoring on my current project, which is something that I’ve done now on several projects. This is one of my favorite ways to become familiar with a new code base, since it typically touches all of the code.

As I do this on new projects, I’m consistently faced with many of the same frustrations, which I have decided to catalog here. Most projects I see make at least several of these mistakes, some make all of them. But since misery loves company, I’ll share these frustrations with you, so that you can repeat them and join me in commiserating–or avoid them, if you so choose.

1. I need to edit 9000 files to change loggers

Changing loggers is hard. It often requires touching practically every file in your code base, with multiple trivial changes.

Maybe you feel you’ll never need to change loggers. But I think this happens more often than most people realize.

  • Maybe you want to switch from logging to `STDERR` or [syslog](https://en.wikipedia.org/wiki/Syslog) to an [ELK stack](https://www.elastic.co/what-is/elk-stack)

  • Maybe you want to send serious errors to a notification service, such as Sentry.io
  • Maybe you want to log to `/dev/null` during tests

  • Maybe you want to switch from a log format that’s readable on the console, to something in JSON for automatic parsing and aggregation

To accommodate these scenarios, use a logger abstraction. Some languages come with a good logging abstraction ready-made for you, such as Log4j for Java, or Logrus for Go. If your language doesn’t have such an abstraction available, you may want to create your own–it’s not terribly difficult to get started.

With such an abstraction in place, it becomes trivial to change where your logs go and how they are filtered or formatted, by modifying only the backend.

2. How in the world do I test this logging code?

Because logs are used throughout a program, it’s common to create a global logger instance, and just use it. Of course, most developers recognize that globals are bad practice when it comes to things like a database. But the same holds for loggers, too. And for the same reasons.

You might think you’re safe by using STDERR, but don’t be fooled—this is still an instance of a global logger!

  1. Globals result in tightly-coupled code.

    Global variables introduce an unseen dependency. Even when using a logging abstraction, as described above, if it’s global, you have an unseen dependency on whatever code initializes that logger.

    You’re also tightly bound to every other library used in your code, which may alter your logging behavior in unexpected ways.

  2. Globals are difficult or impossible to test.

    Log output should be included in your unit tests (it is one of the affects of your function, right?). Using a global logger makes this very difficult in many languages, sometimes impossible.

    Further, in the instances where you don’t care about logs in your test, you should be able to discard logs. Again, a global logger makes this difficult. A common side-effect is that when you run your unit tests, you see thousands of lines of meaningless log output on the console.

    Of course some languages, particularly interpreted languages such as JavaScript or Perl, make it possible to swap a global object with a mock for testing. I still prefer to avoid this approach for all the other reasons I believe globals should be avoided, but it can definitely lessen the impact that globals have on testability.

The solution is to use an object instance for your logger, the same as you should with a database, or other external dependencies, then use dependency injection to provide the logger to the code that generates logs. This makes it easy to replace a real logger with a mock or a null logger for testing. As a simplified example in Go:

// NewFoo constructs an instance of Foo.
func NewFoo(logger log.Logger, /* other arguments */) *Foo {
	Return &Foo{
		logger: logger,
		/* other initialization */
	}
}

// DoStuff does stuff.
func (f *Foo) DoStuff() {
	f.logger.Debug(some debug message)
}

And the same in JavaScript:

function Foo(logger, /* other arguments */) {
	this.logger = logger;
	/* other initialization */
	this.doStuff = function() {
		this.logger.debug(some debug message);
	};
}

Another benefit of using standard dependency injection for the logger is that it makes it easy to add context to your logs, as you traverse your code. For example, you may want to add a request ID to all logs associated with a particular HTTP request. If you use dependency injection for your logger, it’s easy to do this in one place: wrap your logger to include a request ID so that all downstream code will include the ID. More on this concept in the next section.

3. I keep seeing the same error logged 5 times

Does this code look familiar to you?

if err := doSomething(); err != nil {
	log.Errorf("doSomething failed: %s", err)
	return err
}

Or how about this example in JavaScript?

try {
	doSomething();
} catch(e) {
	console.log(e);
	throw e;
}

The code detects that an error occurred, logs the error, then passes the error up the call tree to whomever called the function.

This usually means one of two things:

  1. The error will be logged multiple times.

    The caller is likely to log the error, too, in which case you have two (nearly identical) log messages about the same error.

  2. The error will be logged, when it should have been ignored.

    Maybe the caller is able to handle the error, and do something else (such as a retry, or ask the user for intervention). In this case, you have an error log that indicates a non-problem.

As a rule, handle an error exactly once. Logging an error counts as handling it. Returning/rethrowing an error counts as handling it.

I am aware of two common counter-arguments to this rule. Both express very valid concerns, but have better solutions.

  1. Loss of context

    The most common argument I hear in favor of these otherwise extraneous error logs is that without them, it’s not clear where in a program the error originated. Or maybe the error log should contain extra metadata (such as a request ID, the text of an SQL query, etc) that isn’t part of the error itself.

    The proper solution to this situation is to include the extra metadata within the error itself. This is easily accomplished with Go, since Go errors are an interface. As an example the github.com/pkg/errors package makes it easy to include stack traces in an error. The same pattern can be used to include arbitrary metadata in an error.

    JavasScript can easily do the same. For example:

    try {
        /* some code */
    } catch(e) {
        e.userID = "bob";
        throw e;
    }
    

    And any other modern language can do something similar.

    So rather than logging twice, bundle the extra context into the error and return it. Then if the error is properly handled, it’s still properly discarded, rather than logged for no reason.

  2. User-facing errors vs developer-facing errors

    The other argument I sometimes hear for logging-and-returning errors is that we need to log more information than we want to show to an end-user.

    The proper solution to this is to use errors that support user-facing errors, as well as developer information.

    A very simple way to handle this is to ensure that all errors are turned into something innocuous before display to an end-user. For example, when an error occurs, you might display “Something went wrong”, rather than “Host ‘foo’ is blocked because of many connection errors.”, then log the actual error message.

    A more robust approach is to use an error implementation that allows storing a user-visible text, and a developer-visible text.

    My usual approach is a combination of the two. When handling an error for user display, if the error contains an explicit user-visible text, I use that, if not, I fall back to the “Something went wrong” approach, just in case some sensitive information accidentally made it into an error message unchecked.

4. Why does this code exit when it logs?

Most logging libraries give you many (too many!) log levels to choose from. Trace, Debug, Info, Warn, Error, and Fatal are common. Some libraries have twice as many levels.

Levels such as Fatal or Panic, typically interrupt control flow after they generate a log message. This can be by way of throwing an exception, or even exiting the program entirely.

Never do this.

NEVER do this. Literally, NEVER!

Why?

  1. Logging and control flow are entirely unrelated concerns

    It doesn’t make sense to combine logging with if, break or a return, so don’t combine it with exiting your program.

  2. It produces tight coupling

    Usually fatal logs happen during initialization of some service, such as a database. By doing a fatal log in your database init, you’re demanding that all users of the database exit if there’s a problem. You don’t give consumers a chance to recover.

    Instead, return/throw an error, and if the caller doesn’t have an option to recover, then let the caller exit. Don’t force this on the caller.

  3. It makes debugging more difficult

    Good practice is to have a program start and exit from the same place (such as main() in C/C++, Go). This makes it easy to follow the flow of a program when something unexpected happens.

    By limiting yourself to a single exit point (or possibly two, close to each other in code: one for success, one for failure), you’ll never be left guessing where an unexplained exit occurred.

If it makes sense in your case to log something, and to interrupt control flow, by all means do both. But do both explicitly, with separate function calls:

log.Errorf("Cannot initialize the dang service: %s", err)
os.Exit(1)

5. Can I ignore these warnings?

While on the topic of error levels, kill all your warnings, too.

The “warning” level always falls between “info” and “error”. Which is dangerous territory. When you log a warning, you’re usually saying “Something unexpected happened, but it’s not serious enough to fix.”

If it’s not serious enough to fix, then why log it? If it is serious enough to fix, then call it an error.

Dave Cheney advocates for only three log levels:

  • Debug: Things only developers care about while developing.
  • Info: Things users/administrators of your product care about.
  • Error: Things users care about.

In my view, this should be sufficient for practically any application.

If you find yourself tempted to use the “warning” level, one simple question can help clarify whether to use info or error instead:

Is it something that can, or should be fixed, either by the user of the system, or by the developer? If the answer is yes: It’s an error. If the answer is no, it’s info (or can be completely ignored)

6. Why do I need to read through so many debug logs?

Debug logs do have legitimate uses, but I often see them heavily over used, in two main ways:

  1. Debug logs left in the code after their usefulness.

    Many debug logs just dump seemingly random data to the log. Maybe a configuration structure, or the body of an HTTP response. These logs were clearly added in the process of some very specific debugging. This is a case of using logs in place of a debugger (which can be legitimate, mind you).

    These logs should be removed after the debugging is completed. They serve no purpose for future developers, and only serve to clutter the code, and the log output.

  2. Debug logs in place of proper stack traces

    One pattern I’ve seen frequently is to scatter debug logs (or even info logs, when used for production errors) around the code base, so that in the case of an error, the log output can be used to reconstruct a code execution path.

    This is lazy coding, that leads to much more work at debugging time.

    A much better approach is simply to include a stack trace in every error log. Not only does it produce a much cleaner code base, it makes debugging about 10x easier, too, because you’re told immediately which line/function produced the error, and all the breadcrumbs necessary to trace the call path.

  3. Debug logs in place of a debugger

    Another common debug pattern I see is debug logs scattered around the code base, reporting when a function is called or when a function returns.

    This is usually a substitute for using a proper debugger, or for proper unit tests.

    One quality of a good unit test is that when it fails, it tells you exactly which function failed. This means that when doing proper unit testing (particularly in a TDD flow), the use of a debugger is often completely unnecessary.

    For the cases where existing unit tests don’t provide the insight you need, the use of a proper debugger is usually a much more efficient way to get the insight you need.

    Only use debug logs as a last resort. And as mentioned above in item 1, delete these debug lines after your investigation (and hopefully after adding useful unit tests!) before committing to version control.

Logging is a complex topic. And now with these 6 frustrations in mind, you can either make it more complex, or less, as you choose.

If you’re lucky enough to be starting on a greenfield project, you may have the luxury of setting up your project with a modular logger, and proper control flow, and only the necessary logging levels.

More likely, though, removing these common mistakes from your project will be a long process. Don’t let that discourage you. It can be daunting, but just get started, and take a piece-by-piece approach.


Image credits:


comments powered by Disqus