Tiny DevOps episode #22 Andy Suderman — Where To Host Your Kubernetes
December 7, 2021
Andy Suderman of Fairwinds joins me to talk about the pros and cons of each of the big three cloud providers, Amazon EKS, Google GKE, and Azure AKS, and helps point new Kubernetes adoptors to the optimal provider for their needs.
Speaker 1: Ladies and gentlemen, the Tiny DevOps Guy.
Jonathan Hall: Hello, everybody. Welcome to another episode of Tiny DevOps where we like to talk about dev and ops on small teams and small companies. I'm your host, Jonathan Hall. Today, we're talking about a topic that should be exciting, I think. We're talking about Kubernetes and how to decide where you as a small team, small company, should host Kubernetes. I have a guest today, Andy Suderman. Did I say that right?
Andy Suderman: Yes.
Jonathan: Great. Thanks, Andy, for coming on. It's great to have you here. Would you do a brief introduction? Tell us what you do, maybe where you work, and maybe why you know something about Kubernetes?
Andy: Yes, sure. Thanks for having me today. I've been working with Kubernetes for about ﬁve years now, l think. I've lost track about a year ago on exactly how long it's been. I came from a sysadmin background building networks as a kid with my dad. Did a lot of sysadmin work at a company in Denver that did web conferencing software.
Our team got slashed in half and we were forced to reevaluate our priorities, and so I started working in Kubernetes. I got the opportunity for the next year and a half to really just focus on that, build out a production product running in Kubernetes. After that, I left and I came to this company that I work at now called Fairwinds. All we do is Kubernetes all day long. I spent the ﬁrst couple of years working just directly for customers, running their infrastructure in Kubernetes. In the last year or so, I've been taking more of a leadership role as our senior technical resource.
Jonathan: Great. You say you work with Kubernetes all day long, are you essentially the company that people outsource their Kubernetes management to? Is that how that works?
Andy: Yes, we have done that in the past. Over the last couple of years, we have begun to focus on, and are focused on building software to help teams run Kubernetes specifically. The management piece is deﬁnitely something we do. We still know how to do it. We help people get successful in Kubernetes, but we are truly focused on building software that helps people to be successful.
Jonathan: Wonderful. That really puts you in a unique position, I think, to help answer the question for today. Before we dive in and just directly answer that question though of which platform should we use for Kubernetes, I'm curious, maybe if you want to talk-- Of course, we have the big three. There's many ways to run Kubernetes, but the big three cloud providers, Google, Amazon and Azure, what's your experience with these three? Do you have any just off-the-top pros or cons from something that attracts you to one or the other?
Andy: Yes. I've deﬁnitely worked in all three of the big ones. I've also worked in DigitalOcean on a personal level. I've done a lot of stuff there in [unintelligible 00:02:54]. I also did an evaluation of Linode's Kubernetes offering. Actually, that's probably a while ago now. First takes, I started out in Amazon. That was where the company was when I was there. Amazon felt the most like a traditional data center to me when I ﬁrst started working in it. That's what really spoke to me as a former, or reformed sysadmin as they say. [chuckles] Coming to Fairwinds, I got to have lots of experience in both GKE and AKS, both of those platforms.
They all have their pros and cons. If I had to pick one right off tomorrow, I would probably stick with the Amazon. If totally greenﬁeld, coming out of nowhere, that might be what I'd pick, but I can't say necessarily that I would absolutely pick one over the other. There's so many factors that go into that decision and so many pros and cons to each one.
Jonathan: We'll dig into some of those here in just a minute. My experience, I started with Google and GKE. In fact, I actually started playing with GKE, I think before Amazon even had theirs available yet. Google is, of course, the ﬁrst. They more or less invented Kubernetes, depending on how you want to define invention. It makes sense that they were there ﬁrst, but everybody else is catching up so fast. Of course, Google knew that was going to happen when they open-sourced it and made it a community project. Briefly, what would you say are the top two or three reasons that you would choose Amazon? Let's just start with that. Let's just start with Amazon. What are the top two or three reasons that you like Amazon for Kubernetes?
Andy: I've mentioned one already. The structure of networking and VPCs in Amazon really likens itself very much to the older data center mindset. If you're coming from that world, it's much easier to wrap your head around it, in my opinion. I really like that aspect of it. As far as the managed Kubernetes offering, EKS has come a long way. I also started in GKE before Amazon had an offering. We were using Kops on AWS back before EKS existed.
EKS has come a long way. It is really a lot better than it used to be. I know there are still pain points around managing EKS and upgrading EKS, but with the advent of managed node groups and things like that, it's really starting to catch up in terms of feature set. In Amazon, one of the biggest things to think about when choosing any cloud provider is all the surrounding services. It's not just Kubernetes. You're not running everything in your Kubernetes cluster most likely. In fact, in most cases, I won't recommend that you do that. The surrounding services in Amazon-- RDS is a great product or was a great product. There's a huge list of services that you can use in Amazon around your Kubernetes cluster that are fantastic. That'd probably be the top two there, I think.
Jonathan: If somebody asked you the same question. Let's just do the two. Let's do Google and Azure. What are the beneﬁts that each of those provide?
Andy: GKE and Google, really easy to get started in GKE. It's super easy to get going in GKE. Autopilot is a very promising new thing with GKE if you're willing to accept all the guardrails that it puts around everything. I really like the concept of what they do there. The networking in GKE is, one might say, easier, but it's also a little bit more obscure as to what is actually happening in your network. If it's something that you're comfortable with learning, it could be a really easy way to do a lot of the networking pieces. Just the length of time that it's been around. GKE has been-- They've been around the block. There's a lot of experience there and you can't really put a price on that, in my opinion. That covers GKE.
Let's talk about Azure AKS. That's a little bit tougher one. We haven't done a lot of Azure, we haven't done a lot of AKS. I will fully admit it's not my favorite platform. There are beneﬁts to it, but I think they're mostly outside of Kubernetes speciﬁcally. I think if you're a big Active Directory shop, you're already using Active Directory, that's a great reason to go Azure because it's going to be familiar to you. I was a Windows sysadmin before I was a Linux admin. I can see how, if I was still in that world, the Azure ecosystem might speak to me.
As someone who spent the last ﬁve years primarily in AWS and GKE, I find Azure to be extremely difficult to keep up with. If that's not where your expertise lies, if you don't have a need to run Windows nodes in your cluster, which you can do in any cloud, which you can do in AWS. I'm not actually sure if you could do it in GKE. I haven't tried myself. If you don't have those, I would suggest going with one of the other two. Azure's going to be a big shift there.
I know there's a lot of other reasons to use Azure. There are some compliance pieces that ﬁt nicely into AKS and Azure if you have those requirements. I hesitate to issue blanket statements about any one cloud provider because they really do all have their pros and cons, but I would put Azure at the bottom of my list, quite frankly.
Jonathan: As it relates speciﬁcally to Kubernetes, which is of course, the topic for today.
Jonathan: I'm really curious to hear a little bit more about-- You started to talk about this a little bit with AWS, about the management and the node upgrades, because this is always one of the pain point. If you start Googling which cloud provider should I choose, these are the pain points that people are always talking about. If you're not familiar with Kubernetes, if you're just thinking of starting these Kubernetes, for example, this sounds like nonsense. What are they talking about? Can you spell out a little bit to somebody who's listening who hasn't been using Kubernetes for years what are we talking about when we talk about node upgrade pains and stuff like that?
Andy: Sure, I can talk about that a little bit. In any managed service for Kubernetes, what they're going to give you is what's called the control plane. Those are the pieces that orchestrate your cluster. That includes the API server, the controller manager and all the various other pieces that make Kubernetes work. Depending on your cloud provider, you have to attach worker nodes to this control plane in different ways. The worker nodes are where you're actually going to run your workloads. This is where your pods, which contain your containers, are going to run. Each cloud provider has its own way of managing those nodes or those pools of nodes that you're going to run your workloads on.
In the early days of EKS, the only option to attach nodes was via some relatively complex CloudFormation and scripts that they would give you that you could use. Then, eksctl made by Weaveworks became a thing and that made things a little bit easier. Still, you have this weird detachment of your nodes from your control plane. If you go into the EKS cluster console, it wouldn't even show you what nodes were connected to your cluster. It had no concept of that. You would click to upgrade your cluster in EKS and it would upgrade your control plane to the next version of Kubernetes, but your nodes are still whatever your launch template for your nodes came up as originally.
Say I just clicked the button to upgrade my cluster to Kubernetes 1.20, my nodes may still be running 1.19, which is fully supported by Kubernetes, and minus one is the supported path, but you really want to get your nodes up to the next level so that you can continue on this upgrade process. That's where the complexity typically comes in with upgrades, because now, I need to roll a new launch template so that my new nodes come up with a new version, and I need to have a way of moving my workloads onto these new nodes. There's different strategies for that. We typically double the number of nodes, let the new ones come up, and then slowly drain off the old nodes and let the workloads move. There's even further complexity in that that I won't get into here, but that's what we're talking about there.
Now, when you go to a provider like GKE, they have more of a concept of managed node pools. When you create a GKE cluster, they ask you questions about your node pool. You have to create a node pool initially with the cluster. They really do a nice job of managing the connection between the two. EKS has recently introduced-- I hesitate to say recently. EKS has introduced this concept of managed node pools. I honestly haven't had much personal hands-on experience with it, but I know we're using it. I've had a lot of great reviews about it from various people. That really eases the burden of managing those nodes.
Jonathan: You still said that, getting started, you feel like GKE is easier. You want to explain a little bit more about what you mean by that? It sounds like AWS is catching up, where is it still easier on the Google side?
Andy: One of the tricky pieces of any cloud provider's managed Kubernetes is authentication. When you spin up a cluster, you have to be able to authenticate to it. AWS has made this relatively easy when you get going because they automatically allow your user to connect to the cluster as an admin, but in an environment where you're using assumed roles and things like that, it gets really tricky to manage this config map that controls your auth.
In GKE, the IAM that you use to access your Google account is directly tied to your authentication into the cluster, so that mapping is much easier to understand and control. There are actual IAM roles in GCP that give you permissions in the cluster. They're not super fine-grained, but they do exist. You can control which users have access to the cluster and there's a relatively straightforward GCloud command to connect to that cluster to get a Kube config that you can connect to your cluster via kubectl. That's one thing that makes it easier.
The node pool management is another thing. It's relatively straightforward to add new node pools, remove old node pools, change the size of your node pools. You get a lot of things built in with GKE that you don't in other cloud providers. This is changing as well a little bit. When GKE and EKS were first starting to compete, GKE had metric server and cluster auto-scaling built into the ecosystem.
You didn't have to manage those pieces, you just had to say, "I want this minimum number of nodes, this maximum number of nodes, so keep it somewhere in there and allow the cluster to scale automatically." Your metrics are automatically available. What other stuff? If you want to do service mesh, GKE has a checkbox to get your Istio control plane installed, not that that's necessarily the best way to do it, but if you want to try it out, kick the tires on it, it's a nice, easy way to just try out some of those features.
Jonathan: Nice. Good summary there. You touched on auto-scaling. How does that differ? Of course, Kubernetes has its own concept of auto-scaling, but you also want to tie that to your node auto-scaling, if possible, if you're in the cloud. That's one of the big advantages of using Kubernetes. Is there any meaningful difference across the cloud providers when it comes to auto-scaling your node pools?
Andy: Not really, no. Not at least the way that we typically tend to do it. I'll just elaborate on that very quickly, but your typical method of auto-scaling your node pools is, first, you set your resource requests on all of your pods so that the scheduler that's scheduling those pods knows where to put them. You'll eventually get to a point where you try to schedule a pod that doesn't fit on any node. That pod will go into a pending state.
In AWS, we have the Cluster Autoscaler that watches for those events. It looks at the topology of the cluster and it looks at the Amazon auto-scaling groups that it has available to modify, and it says, "Oh, your pod will fit on a node if I scale out this group," and so it will add another node to that group by modifying the auto-scaling group the desired capacity of that. That same concept applies in GKE. Your node pools will scale exactly the same way. I don't have to run a Cluster Autoscaler in that cluster to do that for me, it will happen automatically, but it's the same concept. In Azure, we run it the same way, I believe with the Cluster Autoscaler in Azure as well. Really not any meaningful differences on how we scale the cluster there.
Jonathan: Good. One big question everybody always asks is price. Is there a meaningful difference in price across these providers when it comes to Kubernetes?
Andy: Oh, I'm not sure. I've looked deeply enough into the price recently to tell you for sure. Originally, the GKE control plane was completely free, which was a huge value-add there. They have since started charging an hourly price for the control plane, so there is a cost there. I know in EKS, if you want a highly available control plane, you're effectively paying for all three master nodes. My guess is that the price is slightly higher, but I haven't actually done a direct evaluation of that.
The control plane pricing is usually your lowest cost. You're going to be running anywhere from ten to hundreds of nodes or thousands of nodes in some cases. Your workload really quickly becomes your cost center there. Quite frankly, if you're keeping your data outside of the cluster in a managed service, usually other pieces outside of the cluster itself are going to be your biggest price points. Then you have to start thinking about things like transit costs for your networking, and the cost of your database, and the cost of your data stores. That's usually where we see the largest amount of price. I know that's a not a straightforward answer to a fairly simple question, but that's where I'm at there.
Jonathan: That's fair enough. I think it's a difficult question to answer anyway, just because the different providers have different prices. Some charge more for memory, others for disk, others for CPU, and ingress and egress. There's so many variables in there. It's really hard to give a direct answer which one's cheapest because it depends.
Jonathan: You mentioned earlier when we started off about if you're already using a service on another provider, like RDS, for example, that can be a strong reason to go with that provider for your Kubernetes. What if I'm really tied to RDS, but for some reason, I want to run Kubernetes on Google. What are the complications that are going to come up there, aside from, obviously, the network traffic across providers? Is that something I should probably shy away from or are there reasons to do that? What am I getting myself into if I do that?
Andy: That's an interesting question. If we put aside the obvious connectivity and networking issues, and we also put aside the cost issue of sending our data across the boundary there and potentially incurring various network transit costs from both cloud providers for that, there's not really any other major complication there. That is the major complication that I can think of. Depending on how you're doing your authentication to your database, you may have some issues there with IAM being able to access the database instance, but if it's a database and a standard database engine, you should be able to provision users in that database and handle that issue.
I would generally shy away from it just because of the complexities of the networking, and the potential incursion of costs, and the incursion of the issues of network topology and potential high latency between your database and your workload. There are obviously other ways to solve that with transit gateways, and direct connects, and things like that. It just adds a large amount of complexity. If you don't have a valid or a strong business case for wanting to run your workloads in one cloud and keep your data in another, then I would definitely shy away from it.
Jonathan: Right. Keep it simple as is probably a good advice here, right?
Andy: Almost always good advice.
Jonathan: [chuckles] Especially when we're working on a small company, or small team, and probably just getting started with Kubernetes, keep things as simple as possible and save yourself some hassle. Let's try to summarize this up a little bit. We've already hinted at some solid answers here, but if we could give the listeners-- If you're listening just to this part of the podcast, you just came for the answer, what is our litmus test here? I'm going to try to summarize. Tell me if I'm wrong. If you're using Windows nodes, probably use Azure. Is that a fair statement?
Jonathan: That one's easy. If you're a Windows shop, Azure is your go to.
Jonathan: If you already have AWS experience, then AWS is probably the way to go. Is that a fair statement?
Jonathan: If you don't have either of those, if you're not a Windows shop and you don't have AWS experience, I think there's a bigger question mark here. What would you say in that situation?
Andy: Barring all other concerns, use GKE, quite frankly.
Jonathan: Okay. There you have it, folks. That's the quick and simple answer.
Andy: Part of me cringes at reducing everything that far, but it is a good summary and I would stand by that. [chuckles]
Jonathan: Well, great. This has been a short and sweet episode, but is there anything else you'd like to add? What considerations should we make? Because of course, when when you're choosing a platform, of course, theoretically, Kubernetes is provider-agnostic. You can, in theory, move your services from GKE to AWS in the future, but that's always much easier said than done. We're marrying ourselves to a platform when we make this choice. What long-term implications, what long-term considerations should people take into account when making this choice? If it's not just about what we're ready for today, if we're looking into two, three, five years in the future?
Andy: I always try to take a business-oriented approach to these questions. What is your business model? What are the things that you're going to need in the future as far as services around Kubernetes? Also, look at what are you getting from a support perspective. Is there a provider that you're looking to work with as far as contracting parts of your stack out or getting some support in those areas? Do they have expertise in one of those? Is that going to be part of your long-term model going forward?
Really thinking about everything around Kubernetes, because like you said, Kubernetes is agnostic. It runs everywhere. I can run my Kubernetes workloads just about anywhere. I've helped companies move from Amazon to Google for various business reasons, but they had other business cases for making that switch. Really think about what are the other things you need around your Kubernetes ecosystem to make that choice of cloud provider. At the end of the day, all of the managed cloud providers are going to be nearly equivalent at some point. Think about all the other pieces first.
Jonathan: The real simple answer here is Kubernetes shouldn't be your decision maker anyway?
Andy: Yes, quite frankly.
Jonathan: All right. Sorry to disappoint everybody listening. We thought it was easy and now it's not, but it's still educational. [laugh]
Andy: Nothing's easy in this industry, for sure.
Jonathan: Right. Well, Andy, thanks a lot for coming on. Is there anything else you'd like to add before we sign off?
Andy: Always love to do a plug for our software, if that's all right here.
Andy: If you are venturing into Kubernetes, you're running multiple clusters, you have multiple teams deploying into your cluster, there's a lot of different ways you can break that setup in Kubernetes. It is not a simple ecosystem. Fairwinds, the company that I work for, our platform insights can help you enforce those best practices and also detect where you're missing on those best practices in your clusters. Feel free to take a look at that at fairwinds.com/insights. Also, I'm available just about anywhere where Kubernetes is popular, so the Kubernetes Slack. We have a community opensource Slack as well. I have a Twitter, but I don't respond to it.
Jonathan: Okay. That sounds like my Twitter. [laugh] Great.
Andy: All right.
Jonathan: Well, thank you so much. It's been a pleasure talking to you. If people are interested in reaching out directly to you for questions, what's the best way to contact you?
Andy: I'm available in the Kubernetes Slack or the CNCF Slack. Those are probably the two best places to get ahold of me.
Jonathan: Okay, great. Thank you, Andy, for coming on. It's been a pleasure and educational experience. We'll see you on Slack.
Andy: Thanks for having me.
Jonathan: All right. Cheers.
Jonathan: This episode is copyrighted 2021 by Jonathan Hall. All rights reserved. Find me online at jhall.io. The theme music is performed by Riley Day.
[00:25:44] [END OF AUDIO]
Adventures in DevOps 111: Infrastructure as code and Amazon CDK
Have you considered the significance of infrastructure as code and its importance in the industry?
Adventures in DevOps 109: Is Kubernetes Right for You?
Everyone and their mother is talking about Kubernetes, but is it right for you?
Reaction to Ably's viral blog post and subsequent outage
I've grown tired of the constant bickering about Kubernetes or no. But this article is more an informative case study in the viability of one alternative.