There are countless configuration file formats. Some are custom-designed for a specific application, such as those used by the Apache web server or BIND. But more commonly these days, new applications use some sort of standardized configuration file format, such as YAML, JSON, XML or INI. Some applications, such as WordPress even use executable code as configuration files.
Most modern languages have libraries that will let you read (and in some case write) to a wide variety of configuration formats. Go’s Viper package supports 6 distinct file formats. Perl’s Config::Any supports 5 (plus executable Perl code).
So how should you choose the proper format for your new project, in this sea of competing options?
In this post, I will make the decision as easy for you as possible. If you don’t care about my reasons, you can jump straight to the recommendations.
What makes for a good configuration format?
Before I dive into recommendations, I want to talk a bit about what makes a configuration file good. I hope we can all agree with these criteria:
A good configuration file format is easy for a human to edit.
This means we need a text-based format, and probably one that supports comments (so not JSON) and simple formatting rules.
For this post, I’m intentionally overlooking applications that manage their own configuration in a database, the Windows registry, or similar. If you’re building an application like that, I’ll assume you’re probably not searching for the best configuration file format, anyway.
A good configuration file format is easy for a computer to read.
This should be pretty obvious, but it’s worth mentioning explicitly.
A good configuration file format is expressive.
That is to say, it should be possible to accurately express the desired configuration in a clear, and unambiguous way. This characteristic can sometimes be at odds with the first one (easy to edit), when it is necessary to express complex configuration structures. In such cases, a balance must be found between ease of editing, and expressiveness. The worst option is to use an unexpressive format, then try to cram all of your configuration into a format that cannot gracefully handle it–imagine trying to store arrays or objects in old INI files (I’ve seen it done! Poorly!).
A good configuration file is easy to deploy.
This one is often overlooked, and the implications of easy deployment depend a lot on the style of application you’re building (desktop app vs microservice, for example).
When configuration shouldn’t exist
With these characteristics in mind, I want to take a small detour. What if we could solve this configuration file format problem by not having configuration at all? That would be a big win for simplicity!
Many projects I have seen have far too much configuration. They often include values in the configuration file which really should be constants. This is a form of tight coupling that leads to over-complex software that is hard to maintain and reason about. So the first step I advise is to avoid this. When you’re thinking of adding a new configuration option, consider one key question, to decide if the value belongs in the configuration file at all:
When will this value ever be changed?
Some specific things to consider when trying to answer this:
- If the answer is “Next time I modify this feature”, or “If/when I discover a bug in this feature”, then make the value a constant, and de-clutter your configuration file.
- If the answer is “The value is unique for every deployment/environment”, then it should probably be configuration (or an environment variable–more on that soon). This is especially relevant for services that may be deployed differently in staging or production, or for different users/customers of the software.
- If the answer is “Every time the program is run”, then you may be better of making the value a runtime option (such as a command line flag), rather than configuration. But that’s for another discussion.
The best configuration format for services
Okay, now time for the meat of the matter.
If you’re building a server process (at least on Unix–I wouldn’t touch Windows for servers, so can’t comment on that), the best place to store configuration is not in a file at all, but actually in the environment. This is one of the tenants of the twelve-factor app, and it checks all our boxes:
It’s very easy to edit.
The exact method of editing depends on your environment–it can be a matter of editing
.bashrc, setting variables in the web interface of your CI tool, or modifying a Kubernetes resource definition.
It’s very easy for the computer to read.
Every language has the ability to easily read and parse environment variables.
It’s expressive enough
This is the one area where environment variables are the weakest, but it should not matter for most services. If you do find yourself needing to provide complex configuration (objects, arrays) to your service, use an environment variable to specify the location of that file, and keep reading for the next recommendation.
It’s easy to deploy.
This is the biggest strength of environment variables. They are very easy to deploy in a wide variety of scenarios: When running locally for testing, when running in different environments (testing, staging, production), and it works on any sort of infrastructure platform (Kubernetes, heroku, systemd, etc).
One other characteristic of environment variables is that they are a little bit “bulky”. That is to say, it’s not always easy to set lots of them efficiently. This can be good and bad. It’s good, because it helps encourage less configuration (and most services shouldn’t have that much!). It’s also bad when you honestly need a lot of configuration, because you may find yourself passing dozens of environment variables between the different parts of your infrastructure, rather than passing a single configuration file around.
When to skip environment variables
The three times when I advise against environment variables for configuration:
- You’re not writing a server process. This approach is confusing for desktop or CLI software used by humans. It’s best for processes managed (almost) entirely by automation.
- You have complex configuration needs. If you absolutely must use arrays, objects, or other complex data types in your configuration, this can be cumbersome in environment variables. There are possible work-arounds (such as Base64-encoding a JSON blob), which work if you have only a few such needs, but this hinders readability.
- You find yourself in a situation where you absolutely need an unwieldy number of environment variables. Look for alternatives first: Perhaps you can store API credentials in a database, and only provide the database config via the environment, for example. But if you cannot reduce the configuration to a reasonable size, consider a configuration file.
The best configuration file format for everything else
If you’re building a desktop application, a command-line tool, or some other program that is primarily controlled by humans (as opposed to automation scripts), then a configuration file is usually a good choice. And in my estimation, the best format for this sort of configuration is TOML, because:
It’s easy to edit.
The syntax is intentionally very simple. In the simplest cases, it looks a lot like INI files (and in fact, some simple INI files will parse as valid TOML). Unlike YAML, indentation is purely cosmetic, meaning you’ll never get confused by tabs versus spaces, or not enough indentation. Unlike JSON, it supports comments. Unlike INI, it has a proper spec.
It’s very easy for a computer to read.
TOML is supported by all major programming languages.
It’s very expressive.
It’s not quite as expressive as YAML (or even JSON) in some cases, because it doesn’t allow arbitrary objects. But it’s rare that this is need in configuration files.
It’s easy to deploy.
It’s just a file, so you can copy it around at will.
When to skip TOML
The only time I suggest not using TOML is if you find yourself in the unusual position of needing a more expressive configuration file format. Kubernetes is an example of a tool in this situation: defining all of the Kubernetes resources in TOML would be incredibly difficult, if not impossible.
The best configuration format for complex needs
In the rare cases that TOML isn’t enough for you, because you need truly complex configuration, I would suggest using YAML, because:
easydifficult to edit.
This is actually YAML’s biggest drawback. It has many detractors. But it is fairly ubiquitous, so its annoying nuances are well documented and possible to work around. And if you need the complexity, you need it. What can you do?
easypossible for a computer to read.
Reading YAML is fairly straight forward, but everyone seems to do it differently. This makes it a poor choice for data interchange between programs, but as long as you’re only using it for configuration files, and you only have one parser processing your files, you should be fine.
It’s very expressive.
YAML is truly expressive. This is why I choose it over TOML in the rare cases that TOML isn’t expressive enough. It’s arguably too expressive.
It’s easy to deploy.
Again, it’s just a file, so copy around at will.
So I lied a bit in the title. My three recommended file formats are really two file formats, and one alternative to files entirely.
When starting a new project, unless you have some very special needs (and if you do, you know it, and you won’t be reading this article anyway), the best configuration file formats for your software are, in order of preference:
- Environment variables for automated server processes
- TOML when you need configuration files
- YAML in the rare situations where TOML is not expressive enough
There you have it!
Do you think I’ve overlooked anything important? Do you disagree with my recommendations? I’d love to hear from you in the comments below!