Tiny DevOps episode #21 Bryan Finster — Minimum Viable Continuous Delivery
November 30, 2021
Speaker 1: Ladies and gentleman, The Tiny DevOps Guy.
Jonathan Hall: Hello, everyone. Welcome to another episode of The Tiny DevOps podcast, where we talk about dev and ops for small teams. Today, I'm really excited to have special guest Bryan Finster here, who has helped, I guess, co-author a minimum viable CD document, which we're going to talk about today to introduce that to you and talk about what it's for and how you might benefit from it.
Welcome, Bryan. Thanks for coming on. Do you mind introducing yourself briefly a little bit? Tell us how you got involved in CD, and what kind of work you do.
Bryan Finster: Yes. I've got involved with continuous delivery several years ago when we're trying to pilot it at Walmart. I was in supply chain working on how do we take a really large entangle giant mess to deliver every quarter, and how do we deliver that every two weeks, and ultimately, how do we deliver it even faster than that. Learned on the job, also learned that it was a much better way to live in that process, better outcomes for us as developers and also for the business, so I've been working ever since then. I try to get everybody else on board.
Jonathan: All right. Are you still with Walmart now?
Bryan: No, I left Walmart after 19 years. I joined a scrappy little company called Defense Unicorns that's working with the US Air Force on Platform One.
Jonathan: Are you able to implement CD there as well?
Bryan: Platform One is suppose to be a secure CD platform. Yes, I'm actually-- Along with helping them grow, I'm also working on a tool for the platform. We deliver changes to that very frequently, multiple times a day.
Jonathan: Very nice. Thanks again for coming on. Today, we want to talk about, or I want to talk about minimumcd.org, which is the home of this minimum viable CD document that you and others have put together. Before we dive too much into what's in the document, would you tell us how this document came about? I understand you were at the conference not long ago, and maybe just tell us the story how this document was born.
Bryan: I was speaking at DevOps Enterprise Summit, and I frequently talk about continuous delivery to either [unintelligible 00:02:31] multiple places. I was sitting at the bar with several other people. We were talking about some of the misunderstandings people have around the continuous delivery. I was talking a little bit about the history I've seen where you had people who were trying to execute continuous delivery. You had leadership saying, "Oh, everyone, we have a goal. Everyone should be doing continuous delivery." Then they roll out a half-baked implementation to teams that haven't done it yet, and things fall apart.
That could be anything from-- Now you have-- You're doing continuous delivery because you have a pipeline. You have build and deploy automation, or you're delivering very frequently which not testing, or you're using a gift flow, or you've got a hot fix process, all of these things. It's damaging to the teams. It's damaging to trying to get people to implement CD. [unintelligible 00:03:34] CD's too dangerous because of this, this, this. We decided that what we really needed to do was just codify in every single context what are the minimum behaviors that will give you the benefits that we know can happen from continuous delivery, and not as a, "You're not good enough to be a CD." If you solve these problems, this will make your team and your organization better. You'll have a better life.
We spent several sessions at the bar, in birds of feather conversations, trying to hammer out the absolute bare minimum, that this is always true whether you're delivering to an air gap network, or to the cloud, app store, whatever, that if you do these behaviors, this would what we would consider the minimum for a continuous delivery, so that you would start to see those benefits. That took us about two days of those sessions, and start with Google Doc, then we publish it to GitHub. [unintelligible 00:04:43] people to sign on that they agree. Four days later, we had Dave Farley signature on there, which made us very happy because he literally wrote the book.
Jonathan: Nice. That's quite an endorsement then if you can get the man himself to sign the document, right?
Bryan: For sure.
Jonathan: Let's take a step back here and talk about what is CD in a more of a conceptual level rather than the details of the document right now. What is continuous delivery? Mainly, what's continuous deployment? How do they relate to each other? What are the benefits that these practices can bring to your team that you didn't see happening with these bad implementations?
Bryan: At a very high level, continuous delivery is the ability to release the latest changes on demand, that you're always in a releasable state, so that for any reason or no reason at all, you can deploy. Continuous deployment is just the next evolution of that where the latest changes you make will be released immediately. I have in my past had applications I built where we had both kinds of pipelines running with continuous deploy the back end and continuous delivery in the front end, just to control a future release that way.
Jonathan: What are the benefits that this provides then to a team? Why would you want to do these things?
Bryan: Primarily, it's stability. With the ability to get rapid feedback on every single change that that change isn't breaking anything, that means we have a much higher level of confidence in our quality. We also can react rapidly if we have a stability problem with production by rolling forward and addressing it. From my perspective, I spent my career building high availability 24/7, really expensive downtime applications. With continuous delivery, when we have that confidence, it means that outages at three o'clock in the morning, when you're tired and want to go back to bed, are shorter and safer resolved.
Jonathan: I already mentioned the website. It's minimumcd.org, if you're interested in looking at the document yourself. It's fairly short. It's easy to read. It's a little bit longer than Agile Manifesto, but probably shorter than the Agile principles, so somewhere in that range. It's fairly digestible document.
At the high level, you talk about-- It starts with continuous delivery, and then continuous integration, and trunk base development. How are continuous integration and continuous delivery related to each other? We always talk about CI/CD as though it's a single noun, but more of a technical level, what's the relationship between these two concepts?
Bryan: They nest. Continuous delivery is an extension of continuous integration. Continuous integration, you're trying to get rapid feedback on your changes and let that talk to other developer's code, make sure that that code is releasable. Continuous delivery is adding on additional gates to verify your artifact can be released and ultimately release it. You don't have to say CI/CD because CD is an extension of CI. It's continuous delivery. It contains all these disciplines.
Jonathan: That's a nice shortcut. For everybody listening, you can stop saying CI/CD, and just type CD. You save three characters every time you want to talk about this.
Bryan: Yes. If you ask Dave Farley about it, he'll rant about it too.
I've observed many times that people also often have the wrong impression of CI, which I think is similar to the arguments you're making here about CD. That is, if you get to a room with 100 developers and you say, "How many of you are using CI?" probably, 90 will raise their hands. Then you start really drilling down and asking, "How many of you have feature branches that live at most 24 hours?" and half the hands go down. Then you ask, "How many of you have multiple developers working on the same feature branch?" and another half go down. You drill down and two people's hands are left over that are actually doing continuous integration.
I would love to hear your response. I perceive that there's this problem that we see a CI tool a pipeline, and think that is CI. If you have the pipeline in place, then you've accomplished all there is to do with CI. I think that you're saying that we have the same problem with CD. You have a CD pipeline in place, and you think you're done. Am I on the right track here?
Bryan: Yes, very much so, and I've seen that repeatedly. "Oh, we do CI because we have Jenkins." Jenkins is just a tool. CI is behavior. You're right. The discipline of the true behavior of CI is where you get the benefits from. You don't get the benefits just because you have automation to run your test. The lessons that you have to learn to break code down small enough, so that you can have branches, that code I wrote today goes on the trunk today, and as far as I know, it's releasable.
This is the thing, is that people, if they keep aiming for the wrong target, they never see the benefit. Then it's just, "Oh, it's more buzzworthy stuff like that Agile and DevOps stuff." This is a real engineering discipline. This isn't principles. This isn't vague things. This is an actual engineering discipline that we have to be good at.
Jonathan: If somebody goes to read this document, and they discover that they're not doing minimum viable CD, they're missing several of the points here, what should they do? How do they respond to this?
Bryan: I would hope that they did the same thing that we did originally when we were trying to learn how to do continuous delivery because we had pipelines. We had automation available to us. Literally, on that very first team, I wrote down the rules for continuous integration. Just start there. Not even delivery, just continuous integration. Then every day, why can't we do this? What's the next problem we need to solve? Then we just hammer through the list of problems to solve until we could do that.
Then further on, all right, now, why can't we deploy multiple times a day? Let's solve that problem, and then we just hammer through that list. Some of it was internal of the team, some of it was external of the team, and we just hammer through it, and having that conversation and using this as the focal point for continuous improvement process instead of, "Well, our meetings are too long." Okay, that's great, but what are we trying to accomplish? We're trying to deliver better.
Using this as the tool for continuous improvement, I found very effective both on that first team and then on when I was leading the DevOps [unintelligible 00:11:39] at Walmart. Look at it as, since this is the minimum, we're not doing it. We don't need to feel bad. We just need to feel bad if we stop before we are.
Jonathan: That's really important. Don't feel bad if you're not doing this. It's a goal. Just because you weren't there yet doesn't mean that you're a bad team or a bad person or a bad manager. Just improve. Work on improvement, right?
Bryan: Yes, 100%.
Jonathan: Continuous integration, you say, or continuous delivery builds on and extends continuous integration. Then next down on the list, you have trunk-based development. Is that the same relationship?
Bryan: Yes. If you look at any of the literature on continuous integration, it'll say that you should be doing trunk-based development. Paul Hammant actually wrote a book recently called Trunk-Based Development. He shows that relationship that the CI depends on trunk-based development, and I've had people tell me they're doing CI and they're using a complicated branching structure. They're not really integrating the code, or they're not verifying the [unintelligible 00:12:49] releasable artifact because they're integrating to a develop branch, but they're not releasing the develop branch, and the build on the trunk is a different artifact than what they're building on the develop branch.
If you're doing that continuous integration flow at a branch that's not the trunk, all you're doing is adding an extra merge that adds an extra manual touchpoint that adds an extra place where we can create defects without any added value. Now, I have a lot of people tell me, "Well, we want to make sure the trunk is always releasable, and it's got to be clean." Your tests make sure it's clean. That's how you make sure it's clean. When you're confident that you're testing well, which is should be the goal, what are you doing with this extra merge, this extra thing to inject defects into the delivered artifact?
Jonathan: There are two ways, broadly speaking, to do trunk-based development, if I'm not mistaken. The one is just short-lived feature branches, possibly accompanied by pull requests. The other is you literally push directly to trunk. Does this depend on either one of those, or do both approaches work?
Bryan: Both are valid. One requires more effort than the other. I've talked to Dave about this a little bit because he's a real big advocate of going directly to the trunk. I've worked in environments where compliance was a really important thing, and we had to have auditable code review for audit. We would mandate that you use branch and merge instead of direct to trunk. Now, there are ways to also get that auditable trail going direct to trunk.
It just depends on how you set up, how you're capturing that, and what gates you put in the pipeline to make sure that that's actually happening. Either one is viable. I think that if you're going direct to trunk, you really should be pair programming 100%. If you're not pair programming, you really need to branch and merge because then you have a good code to be processed. Pair programming is going to get you delivered faster. It just feels slower because there's fewer keyboards, I guess.
Jonathan: All right. We've worked our way backwards from continuous delivery to continuous integration to trunk-based development. Are there any other foundations that we build on, or is that the starting point?
Bryan: You really should test your code. [laughs] You can't even start this if you have the attitude that, "I'm a coder and someone else tests." You can't. Trunk-based development absolutely 100% depends that you're testing your code.
Jonathan: What if you're in a place now where you're not testing your code? Maybe you have 0% or well under 50% test coverage on your entire project. What's the first step? Do you need to build up that test suite first, or can you start with trunk-based development and just do test maybe for new features, or how do you approach that problem?
Bryan: I've lived this problem personally. I've done a lot of research on how to help my team and other teams get past this problem. You've got 100% untested codebase that's running in production, what's tested in production. You know it works because it's working in production. It's dangerous to go back and try to backfill tests because the code was written in a way where it was not architected to be tested. You have to refactor the code to test it. Refactoring without tests is incredibly dangerous.
The process you use is we just have a working agreement. We will no longer push untested code. 100%, if we come through a code review of this untested, it's rejected. Then the next step is when we get a feature, and then we start breaking things down, we document how it's going to work in testable acceptance criteria, and then we use that to slowly refactor the code to be testable. We refactor the portion of the code that requires to change using those acceptance criteria to verify it's behaving the way we want it to work.
Then slowly over time, you start building out tests. Now, you can do trunk-based development just with that working agreement and focus on-- You might want to practice some testing first if you don't even know how to test. [unintelligible 00:17:22] that working agreement on some knowledge on how to do good functional testing. That's all that's really required. That was my personal journey as well. We were working for years where testing was in production. [laughs] We had to get really disciplined at it if we were going to be rolling continuous delivery out across that area. We had to dig in and really focus on teaching each other how to test.
Jonathan: Suppose you have a team that's doing manual testing, maybe there's some automated testing, but there's also a manual element of testing going on, is there any room for that in a CD environment? If so, what would that look like?
Bryan: I'd say not manual functional testing. There's always room for exploratory testing, usability testing. Those things should be happening in parallel to the pipeline and not as deploy gates. If you are using manual testing for deploy gate, you really need to automate that and get it out of the way. The first step that you're going to probably automate that is within the end tests, which are going to be flaky. Then your next priority is to start tackling the flakiness in those tests and get some test suite that you actually trust.
I've had people say, "Well, humans can test better than machines." That's just wrong. If you're running a test script as a human, you can't repeat it. A machine can actually repeat it. If you find a defect in a test script, you can fix the defect once instead of hoping everybody else doesn't implement that defect manually. Getting your criticals, and I'd say not fully, your critical paths tested within the end and then start building out a real effective test suite that doesn't rely on testing to get it done.
Jonathan: What would you say to a team that has some unique testing requirements? For example, I was talking to somebody recently that builds software related to bicycles. Their testing suite involves physically riding a bicycle around the neighborhood and looking at GPS inputs and stuff like that. In principle, I can imagine ways to automate that but it's a huge undertaking. Would you just say that CD just isn't possible in that situation, or is there some wiggle room there for weird corner cases like that?
Bryan: That should just be a continuous process, and then you're going to have a controlled release to your test users, your alpha testers. You have a controlled release at the pipeline. The main thing you're trying to get with continuous delivery is feedback. It doesn't mean that you have to ship every single second to an end user, but you need some real life feedback to verify that you're testing correctly. The only test for a test is production, and delivering to those people riding those bicycles is an important test for production.
Now, you want to put some balance there. You don't want to say, "Okay, we're going to spend the next three months riding that on bicycles." How fast can we get this feature out to a broader audience to get better feedback? That's what it's all about. I need feedback as rapidly as possible to get the quality where I need it.
Jonathan: Supposing that your team has now accomplished minimum viable CD. Is it okay to stop there? Are they done, or what's the next step? What do you do from there?
Bryan: Oh God, no. The goal there is just to show you this is possible. It's to get you over the hump. Teams just don't believe continuous delivery does what we say CD does until they get past the hump. This is just to get you over that hump. Your goal from then on is hardening the pipeline because the very first thing you should do when you're focusing on continuous delivery isn't even to automate everything. It's to write down what defines releasable for us. What do we have to verify to make it releasable? Then you start executing the discipline, get to the minimum level and then focus on how do we add automated gates to verify releasability. Then you deploy and you find out what you did wrong and then you harden the pipeline.
If you are an actual product team and you really want to be truly agile and not just do agile ceremonies, then what you're going to do is you're going to deliver, get feedback from production and your primary product isn't even the thing that you're delivering to production. It's the pipeline. We're going to harden a pipeline and ensure that our artifacts are truly production worthy before we let them go. That pipeline's job is not to deploy. The pipeline's job is to prevent bad things from deploying, and you need to focus on that all the time.
You start adding things. You start performance testing. That'd be great, wouldn't it? What chaos testing could we do in the pipeline and not even in production? How do we verify our APIs more effectively? We've got these flaky tests. We can't have flaky tests. How do we rearchitect those tests so that they are no longer flaky, drive them down into a lower, faster test or remove them entirely because they're just not giving us reliable feedback? They just continuously improve how we're delivering.
Jonathan: Suppose somebody says to you, "Bryan, this is great, but we're doing 15 deployments per day and we're using get flow, and we don't have automated tests yet, but we're still doing 15 deploys per day. Obviously, we're doing continuous deployment." How would you respond to somebody who would say something like that?
Bryan: Yes, but is your life good? Are you sleeping well at night? How many hours are you working a day? How much stress are you under when you go and do that deploy? How much toils are involved? Can you confidently deploy at 5:00 PM and then take your significant other out while also being on call? If you can't answer, "Oh, yes," then I would say that you really need to improve something, and if you say, "Oh, yes, we can absolutely do those things," and please tell everybody else in the world how because I haven't met anybody who can.
Jonathan: Hopefully our listeners will go look at this document. How can others participate? This is on GitHub. I see there's an improve this page link. If people are interested in participating and improving this, what can they do?
Bryan: We've had several people give contributions. One of the things is we're now translated into Finnish, Spanish, Italian and French. If you want to submit a translation, please do. You can go on and say, "You know what? I agree with this. I want to add my signature." Unlike other things I've seen out there with signatures, there's not a list of, "We created it and all you other people can sign it." Everybody's equal on here. Some of the creators never got around to it, and so they're pretty far down the list. The sequence of signatures has nothing to do with how important you are, except for, honestly, Dave Farley's. I moved his to the top. [chuckles] If you want to have a discussion about it, open an issue and let's have a conversation. We've had several conversations about people wanting to improve things.
There were things that, "Yes, we agree that those are good ideas, and those are things that you should absolutely do, but they're not the absolute bare minimum for continuous delivering." We've had other people suggest wording changes and we discuss it like, "Yes. You know what? That would make a good change." I'll say that we're very careful about changing the primary document of the minimums because people agree that that is true. We have a very careful process for modifying that. We discuss it intensely.
We also have a page for resources. If you have a resource that you really think is awesome, send a poll request, add the resource. As a community, let's grow that we agree that this is what continuous delivery should look like and these are the good things because it took us years to learn how to do this and to find the good resources versus the bad resources. This is really what we're trying to do, is share. You don't have to go on that journey and start from scratch. We're trying to jumpstart you ahead so that we can all be on a level playing field.
Jonathan: I'd love to hear the common objections you've heard to CD that you aim to address with this document.
Bryan: One of the most common I think is that either we're too busy or too complex. I think a lot of that has to do with people thinking it's too hard to start, and that was part of it. This is the simple. This is the baseline behaviors. There's not a maturity model to it though you can look at this as a checklist. We are not doing these things, but doing all those things in no way makes you good. It just makes you minimum. That's really the primary objection, is that while we're a special case and continuous delivery doesn't apply to us, it absolutely applies in every case.
I'm looking right now at things we're trying to do for, how do we do CD on a submarine that's underway? I've done continuous delivery on giant enterprise systems. I've done continuous delivery on small things deploying for the cloud, and I've seen CD being done to embedded systems. I have yet to see a case where I can't say, "Okay, I know how to architect the pipeline to get this done, and CI is always, always true."
Jonathan: Oh, here's one that I hear a lot. "Our end users need to be informed before changes happen." How do you make that compatible with CD?
Bryan: Feature flags. We're trying to, and we need to get changes into production to verify that they'll run in production. We need to get feedback as rapidly as possible, but we also have the reality of that marketing is a thing, and we need to control releases so that we can have a launch of something. The fact matter is that a lot of the applications you have on your phone have features that you're not seeing. Other people are seeing them.
Then they have a marketing release and they release new features. That's just how it works, is that the minimum of CD doesn't include decoupling release from deploy, but you should. That should be something early on you're looking at, is we need to be able to deploy a change without exposing the change. Again, that works in every environment.
Jonathan: That answer addresses another one that I hear a lot, which is, "We're working on a big feature that is not ready to be deployed yet."
Bryan: Yes, that's the solution for that, but I think there's more of a mindset shift that has to happen as well. You should be terrified of a big change. Big changes are not a thing. You don't do giant refactoring. You don't do giant feature releases. The size of change is going to be directly proportional to the size of the explosion when things break because things will break. I've had this conversation before with InfoSec. When do I need to go have a security review of my application? After you do a big change. Okay, so never.
I never have to have another security review, which again goes, I think it's a whole other subject about security theater. I never have to have another security review because I don't make big changes. I'm evolving my application in very small steps every single day. That's also an engineering problem that you need to solve, is that you may have to make a big change that could be breaking, might have to upgrade a version of your baseline architecture, and you need to figure out how to do that in small steps very, very frequently to make sure that you're always releasable.
Jonathan: What do you say to the person who says, "Our tests are too slow. It takes eight hours to run a test lead," or something like that?
Bryan: I would say that you need to focus on improving your pipeline, that she should use that pipeline duration as a key metric for the health of your delivery, and that you should focus on, why are our tests taking so slow or taking so much time? How do we rearchitect our tests so that they're faster? Are we over-testing? Are we testing a whole bunch at the end instead of testing things more at the beginning? Is our architecture so terrible that the only way we can effectively test is within the end test?
How do we decouple the architecture around clean domain boundaries so that we can test more efficiently and release better? Because what'll happen-- This is a huge risk. If it takes us eight hours to release a change, and we have a high viability system, and we have something that has to be fixed, or there's a security breach and there's something that has to be fixed now, then what's going to happen is you'll bypass all your quality processes to get it out and put yourself in a worst situation, that you need to have that fast pipeline not just so that you get fast feedback but so that you can fix production fast with the safest possible method.
Jonathan: Good. I think [unintelligible 00:31:24] my list of questions. What else would you like to add? The floor is yours.
Bryan: I think I just want to reiterate that I've been a developer for a lot of years. I have not stopped developing. I just talked about my experiences. I've done it for relatively small companies and for the third largest army in the world, Walmart. Running the Dojo at Walmart, I saw many, many, many different use cases. A lot of people have to go to different companies to see.
I've never seen anything where continuous delivery wasn't the right answer. Now, is our confirmation bias? Maybe, but except that we've actually implemented it in those situations and it made the team's life better universally. If anybody can show me a situation where this is not the right answer, please do. I'd love to be educated, but I've yet to see it, and I like a challenge. I'm happy to go and talk about how we can architect a pipeline for that and change the team's behavior to make everything better. I'd be happy too for someone to stop me because I've learned something.
Jonathan: Lovely. If somebody has that challenge ready for you, how can they get ahold of you to challenge you?
Bryan: Oh, they can reach me on LinkedIn, happy to talk to them. Also, I've got a series of blogs that are light rants called 5 Minute DevOps on Medium. If you just go to bdfinst.medium.com, you can see my blogs out there. I talk about lots of different topics around delivery, including team organization, and I've got some things based off of real life that are fun.
Jonathan: Wonderful. We'll have links to both your LinkedIn then and your medium blog in the show notes. Thanks so much for coming on, Bryan. This was a good conversation. I think it's a really important to draw attention to the truth behind this otherwise fuzzy concept of continuous delivery and a thousand other buzzwords in our industry. Thanks so much for coming on.
Bryan: Yes, thanks so much for having me, Jonathan. I do agree. I think the things that aren't fuzzy should not be fuzzy.
Jonathan: Yes. Yes. [laughs] All right. Thanks so much. Everybody, we'll have links to the document and the books and Bryan's contact info in the show notes. Until next time.
This episode is copyright 2021 by Jonathan Hall. All rights reserved. Find me online at jhall.io. The theme music is performed by [unintelligible 00:34:17]
[00:34:19] [END OF AUDIO]
Adventures in DevOps 120: DevOps Research and Assessment (DORA) Metrics with Dave Mangot
Dave Mangot joins Adventures in DevOps to share how he leverages DORA metrics to improve technology organizations.