Tiny DevOps episode #46 James McShane — Is Kubernetes right for your small company?

October 25, 2022
James McShane is the Engineering Director at SuperOrbital and has been working with Kubernetes for about 6 years, in a large number of environments. He joins the show today to help unpack whether Kubernetes is a good choice for your small company.

- What is Kubernetes, and what problems does it solve for you?
- Choosing Kubernetes means choosing a set of problems.
- Which application architectures match well with Kubernetes?
- Which problems Kubernetes doesn't solve well for you.
- How to handle your application data layer when starting with Kubernetes
- Some of the differences between the big three's Kubernetes offerings
- Should you hire experienced Kubernetes engineers before adopting Kubernetes?
- Why is Kubernetes controversial, and how can a newcomer cut through the hype?
- Common newbie mistakes
- How does price figure into the decision to choose Kubernetes or not?
- How to learn Kubernetes if your employer isn't using it

Guest
James McShane
Twitter: @jmcshane
Engineering Director at SuperOrbital.io


Transcript

announcer: Ladies and gentlemen, the Tiny DevOps Guy.

[music]

Jonathan Hall: Hi everybody. Welcome to another episode of the Tiny DevOps podcast. I'm your host, Jonathan Hall, and today we're talking about Kubernetes. Is Kubernetes right for your small team? Hopefully, my guest, James McShane can help shed light on that. James, welcome to the show.

James McShane: Thank you, Jonathan. I'm glad to be here.

Jonathan: Great. Tell us a little bit about yourself, what you do, and maybe why you know anything about Kubernetes.

James: I'm an engineer. I've been working with Kubernetes for the past six years. I'm the director of consulting at a small Kubernetes-focused consulting and training shop. I've been focused on Kubernetes for a long time in my career and worked with it in a bunch of different environments. Really glad to be talking about this today with you.

Jonathan: Great. You say a bunch of environments. Really high level. What size environments? What's the smallest cluster and what's the largest you've used?

James: The smallest cluster, I've deployed micro case instances to run software on drones and small devices up to-- I've seen multi-hundred node clusters running huge data processes and workloads. Spans a gambit of the different workloads you can put out there.

Jonathan: Great. Of course, the audience for this show is usually small teams, 20 engineers or fewer, or something like that. I'd really like to explore when or if Kubernetes makes sense for teams like that. If anybody has been living under a rock for the last 6 to 10 years and they don't really know what Kubernetes solves, maybe you want to just give an introduction to what Kubernetes is and the problems it tries to solve for us.

James: Kubernetes at its interaction level is an API that allows you to very easily orchestrate container technologies across a distributed system. It's very easy to take an application, put it in a container, and run that application in a way where you can declare what you want. It's declarative orchestration of these containerized applications. You say this is how I want my application to be running.

Kubernetes goes out and it does that for you through its deployment model. It takes that process, it schedules it on its compute node, and executes that in the way that you've outlined. That allows you to really easily stand up workloads and get those things running, especially for stateless applications. There's a bunch of complexity that gets built in that we can talk about today, but at its core, Kubernetes is about orchestrating containerized applications.

Jonathan: To try to make that a little bit more tangible maybe for someone, if you're struggling with the monotony of maintaining EC2 instances, for example, or maybe physical servers, you're tired of installing physical servers and fixing them when the hard drive crashes and stuff like that, Kubernetes can help with some of these sorts of things. Is that right?

James: Absolutely. It makes it very easy to run your application the same way across multiple environments. I take this application, I package it up, and then I take that package and it's running the same way on my machine. It's running that same container image in your dev environment. Then you push that up to the production environment and you're confident that it's that same application that you built in your build pipeline, and that's showing up in each of those environments.

Jonathan: How do you think we should best tackle the question of who should use Kubernetes? I can think of several angles here, but maybe you have some thoughts on how to tackle this.

James: My thought about this at the start is that choosing Kubernetes means that you're choosing a set of problems that you have to be able to solve. There's no magic DevOps fairy dust.

Jonathan: It sucks.

James: It will be wonderful if you wake up one day, you find the DevOps fairy dust and everything is perfect, you no longer have any toil. I think one of the things for small teams to realize is that by choosing Kubernetes, you're choosing a small set of bounded problems. I'd love to talk through some of those. Every choice is a set of tradeoffs. I'd love to get into some of those tradeoffs that you're making with Kubernetes, if that makes sense to you.

Jonathan: Definitely, yes.

James: I think the first thing to think about when we come to, I'm a small team, I'm considering Kubernetes, the first question is about your application, your architectural match for an environment like Kubernetes. You have to understand that there is a certain level of capability in Kubernetes when it comes to the types of applications that can be running out here.

The key thing that I'm thinking about right now is state. State is that key aspect of applications, you have to understand what is stateful in your application architecture, and how do I ensure that that state is appropriately maintained across failure and across potentially multiple regions, and things like that. Do you have thoughts about where would teams that are listening to this podcast be putting the state of their applications when they're thinking about this?

Jonathan: I don't honestly know the answer. I can guess from some of the clients I've worked with that fit the profile of this podcast. I'm sure many are using MySQL or similar. That's probably the most common answer I see at the scale I'm working at. I see Redis fairly often, I've seen a few other odds-and-ends, message queues and, I don't know, different things like that.

James: I think that's a great place to start with Kubernetes as well because if you've made good boring choices that are working for-- I'm not calling MySQL boring or Postgres, but I'm saying those are the baseline, and it's a great baseline to build off of. You can build applications really well on top of that if you've got your data layer really solidly architected and maintained. When we're thinking about a cloud environment, when I approach Kubernetes deployment for a small team like this, the first thing you do is you separate that data layer out and you put that in a cloud service or some other layer.

Because that is not the first challenge you want to tackle in a Kubernetes environment. We have Postgres running in an RDS or Cloud SQL, or something like that where you've got someone else maintaining that for you. Now, your compute environment, you start with your stateless web applications because that is really the easiest way to start into a Kubernetes environment and get your applications up and running quickly.

I think especially for a team that doesn't know Kubernetes, tackling additional challenges of state maintenance is really a place where you're looking for failure really early on. The first question you have to ask when you come to a Kubernetes environment is, do we have an application architectural match for deploying the applications out there? It could be that you have a web service that requests infrequently.

I would say, for a small team, those things are way more a match for putting in lambdas, cloud functions, or something like that. Those types of services don't need long-running container execution environments. If you start to have things where you need to scale those stateless services, you need to start to address, your team is growing, you need to be able to deploy multiple services that interact with one another, that's where Kubernetes starts to really give you those benefits of the network layer and some of those capabilities that come with orchestration software.

Jonathan: Let's talk a little bit about at a high level, I have an idea in my mind and we'll flush it out a little bit, of the sorts of problems that Kubernetes is good at solving. Just for example, it's great at rolling upgrades, which of course EC2 does that for you also. It's great at load balancing and autoscaling, which of course, I guess EC2, Elastic Beanstalk can do. These are the sorts of things I like to think of that Kubernetes is great at solving these sorts of problems. What are some others? What I'm trying to get at is, if you don't have these problems, you can stop listening now. [laughs]

James: Definitely. I think--

Jonathan: Please, keep listening because it's still interesting.

James: Absolutely. I think there are a set of problems where it makes sense to just say Kubernetes isn't right for me at this point. One of the nice things that Kubernetes does is you have this consistent declarative API for declaring not just, I want this application to run and this version of my application to run, but also that configuration management aspect as well. You've talked about load balancing, Kubernetes services, and things like that, and rolling up upgrades. Those are both things that Kubernetes does very naturally out of the box for you, but that configuration aspect as well of config maps and secrets, and making those available to your applications.

Having that configuration aspect separate from the management of your application is far easier than-- let's say you're managing EC2 instances and you need to have a file on the file system, or maybe some instance metadata that you're grabbing. Config maps are far easier to reason about. They're far easier to update and understand what's going on with your application. I think that language for declaring configuration is really useful and helpful for a lot of the ways that people develop applications these days. One of the next parts about Kubernetes is making it easy to understand the scope of your entire application deployment.

When you're thinking about EC2s or Elastic Beanstalk, AWS is managing those things for you. They're scaling up the nodes. You use AWS to understand the full scope of your applications. You see the instance list and things like that. With Kubernetes, you can query across all these different parameters with labels and things like that. You know your computer environment because you can see the nodes that you're running on. You know different segregations of your applications by querying across namespaces, querying with labels. It does give you an organizational framework for building up a catalog of how your application works.

It's very natural within Kubernetes to say, I'm going to divide this set of services as one piece of functionality. I'm going to put that in a namespace. I'm going to have these labels that allow me to operate on that part of my application architecture. Multi-tenant Kubernetes is a different problem, but when we talk about multi-team or multi-application deployment within a single team, the Kubernetes framework for just categorizing and organizing your resources and querying on those makes it really easy to, say, understand what you have deployed and what your application actually looks like when it's in its execution environment.

Jonathan: You keep talking about services. Let's talk about that a little bit. Does it ever make sense to use Kubernetes if you only have a single service? I'm going to show my age here, but if you have a LAMP stack or the modern equivalent, is Kubernetes ever the right fit, or is that just silly?

James: I don't know, guys. I'll say I haven't seen something like that, as I've got my single application that's running-- There's lots of ways that I've seen that successful outside of Kubernetes. I wouldn't want us to take on the set of challenges that Kubernetes provides if you've got just a single application, a single stack like that. Kubernetes really shines when you're operating things together, when you're putting those pieces together of, yes, maybe I have multiple pieces of this, multiple services at orchestra here. I need to load balance between them.

I need to do some of the things that Kubernetes provides in terms of potentially getting traffic in on ingress and splitting that traffic. Those kind of capabilities aren't things that you're concerned about with a single LAMP stack. That rolling deployment capability and configuration management is the only benefit you're getting. There's a lot of other problems. We have to start talking about the patching life cycle and image maintenance, and all those other things that you have to take on if you have just a single stack. There is a level of complexity you have to understand when you're making that choice.

Jonathan: Is there a rule of thumb you like to use once your application reaches a certain number of services or a certain number of instances, or something, then the needle starts to tip towards Kubernetes?

James: I'm going to be a great consultant here and say this is where it always depends. I guess the rule of thumb that you'd have to look at is what is the pain level of managing that n plus one service? How much pain are you taking on by saying, I want to add this new service in my environment? Do I have to go out and update 10 sets of configurations? Do I have to roll all these things?

Whereas in Kubernetes, if I build a nice deployment process, point a service URL in a configuration and pick that up, that's a place whee that n plus one starts to become really easy in Kubernetes and it can potentially be very, very challenging in a place where you don't have those orchestration capabilities available to you. There is a point, but that number is going to depend on, first of all, the team's capabilities. You have to figure out when you want to take on that learning curve.

If your team has experience in a Kubernetes environment, then that number, I think, is far lower because Kubernetes provides that automation out of the box for you. There's just that orchestration of everything across the stack, of network, compute, and configuration. All those things together are helpful. It's just a matter of figuring out when do you want to start to accept the set of challenges that Kubernetes provides in exchange for that ease of management and ease of deployment.

Jonathan: Let's talk a little bit then about the team that's going to be managing this. You said if the team already has Kubernetes experience, that number can be lower. What's the learning curve to get started with Kubernetes? If you have zero experience, maybe you'd have AWS experience, ECT, or something, is it a weekend job or is it bigger than that to start being productive with Kubernetes?

James: Again, I think that that goes back to your architectural match. If you take on some very easy challenges with Kubernetes of, hey, I'm going to deploy these five or six stateless applications into a Kubernetes environment and get them talking to one another, and potentially external queues or databases, then that can be a very short effort to get those things started up. The base set of resources of deployments, services, ingress, that's both explained on the internet. You can start really well with the Kubernetes documentation.

You can find a bunch of great resources on the internet. Just when it comes to that base level starting out, it gives you a really good starting point of I'm going to deploy this application. It'll be this many replicas. I will network it with a service. I'll put an ingress controller in front of it. With the cloud providers providing a very simple way to get started with Kubernetes, each one of those services, from the major cloud providers, you can get started up in a few minutes, you can get images deployed in there very quickly. If you're running simple applications, you can get started with that very quickly.

It's just a question of, when you get into that environment, what's your next step after you got that first deployment in there? I think that the day two things are far more challenging than day one. I can get an application deployed in Kubernetes on a team that's never seen Kubernetes on day one. The problems come in when you have say-- I can have that thing updating and pushing new versions, and changing its configuration. That's the straightforward stuff. Then now we take on the second level of complexity. Do you want it to be autoscaling?

Now, do you want it just image autoscaling or do you want your nodes to autoscale as well? That's something that each one of the cloud providers is going to have a different answer for. You have to start to know that Azure in its Kubernetes provider has autoscaling built into the parameters of the API server when you create the AKS cluster. Whereas in EC2, you have to deploy a provider to your cluster that would do that autoscaler for you. Then you have to set the parameters so that you're going to get the nodes that you want when your application scales up.

That's one example of a direction that you can go in. You have to understand the provider you are using. You have to understand the parameters you select for that. You can get deep very quickly. When you pick a small slice of functionality you want to select, and now you've got to get into your implementation, you have to get into the parameters that are available to you, you have to get into how that interacts with your application. It can get hairy very quickly when you get to that second level of challenges in Kubernetes.

Jonathan: Would you advise a team with no Kubernetes experience to take it on or should they hire that expertise? Obviously, they should hire your firm but-- [laughs]

James: No. I started Kubernetes myself with a small team that had never done this back in 2016. I had a great learning experience in there working with a team that had not done this. We built out a POC initially that was selected by our company to go run into production. We did end up bringing in some people to help us out for a short term. We had a vendor for our Kubernetes implementation and brought them in to help us out to go from that point of a POC to moving into production. That's not necessary at all. I think there's ways to make good, easy choices at the start of your Kubernetes implementation.

There are ways to say I'm going to do the right things first, which is I'm going to know how my application is managed to these different environments. That's a really important question you have to answer first. I think that's not a question you bring in like, people externally for, because it's a business context question. How does my application need to manage its updates? That's going to inform your container, build, and image tagging life cycle. That's going to inform your environment management of Kubernetes. Are you going to do a single cluster that has DevOn prod in it?

That might be how you start out. That's probably not where you're going to get to even when you initially launch on Kubernetes. It became so easy to manage clusters nowadays that you're probably going to have multiple clusters. You're going to want to think about how does my application move from a dev environment to a prod environment? Your question was, do I advise small teams to take this on? I mentioned this a lot, there are tradeoffs with Kubernetes, but there is a lot of mind share in the community of great excellent community resources of getting you off the ground and getting you running with Kubernetes.

The place where you bring in people with experience is when you've decided to tackle like, you've decided to really invest in this area. I want to build on top of the Kubernetes capabilities that come from the reconciliation that it gives me in a controller, or I want to orchestrate across multiple environments, multiple regions, multiple clouds. I want to move my application this way, or I want to orchestrate my data center. You had talked about managing physical machines.

Kubernetes is becoming far more ubiquitous in that environment as well. There are some vendors that we're talking with who use Kubernetes to make their data center implementations work really well for customers and make it really easy for people to go from a bare bones data center metal implementation to something that's really easy to get started with.

I think the mind share in the community and the capabilities that that environment, that the Kubernetes API provides for you are strong enough to say, "I'm going to take on some of these challenges in a Kubernetes-specific context because I'm getting these benefits of configuration management easy deployment, easy routing, and good ways to declare how I want things to be operating." Not just from a compute perspective but from a network security perspective, a node security perspective. There are those features in there that say, I can know about how my application is running because I can ask the API server what's going on. That's really helpful.

Jonathan: I get a sense, you probably felt it too, that the community at large has a love-hate relationship with Kubernetes. There's some people, I think you're one of them, I think I'm one of them, that really enjoy working with Kubernetes. I guess there's some haters, but there's also some who just think it's just too complex for what it provides. Where does this come from? How do you address somebody who's ignorant to the debate? How do you explain it to them so that they hopefully make a useful decision?

James: This is funny. I've got a note here in my notes that says, "Have you ever seen the CNCF landscape?" That image just evokes that thought of, oh my goodness. You have to zoom into it to see these projects or companies' names that are on that thing. I think part of the challenge is that teams that take on-- it's the variety of application architectures that are out there in the world. Think about the scope and level of the types of applications that Kubernetes is trying to address. In 2016, 2017, there was a ton of talk about keep state off Kubernetes, and people still advise you to do that if you're starting out with Kubernetes in 2022.

There has been so much investment in stateful applications, running those really well in Kubernetes. That's a whole framework of solutions and applications. There's the Valero model of backing up your state of your cluster. I think about the test orchestrated MySQL on Kubernetes. There are companies that have built their entire state management layer on Kubernetes using that product. Once you get deep in any one of these areas, you can see it's not a problem of Kubernetes, it's the way that the industry deploys applications writ large.

Kubernetes is trying to be a generic orchestration platform for all these things, which means you have to start getting into the depths of network or the kernel networking. I worked very deeply with the Cilium project and some of their layer seven network capabilities. The kind of network and kernel programming that they do is very deep and very specific to that solution that they're offering. That's just one slice of even just the Cilium product.

That slice happens everywhere you go in terms of application architectures. I think that's the reason that I'm so pro Kubernetes in the end, is that it's trying to be generic, it's not trying to solve all the problems itself. It's trying to provide a platform where someone can go out and solve those problems and say, "Here's a product that can address this very specific application concern." I think your initial question, sorry, we diverse a little bit from it.

Jonathan: It's all right.

James: There's a couple camps when it comes to Kubernetes. In the end the question that I have is, can you build complex applications successfully on Kubernetes? The answer that I've seen over and over again is, yes, you can. You have to pick the right set of problems that you're going to tackle because you start to expand those questions and it gets really hard to say, I'm getting something out the door. Which is the bottom line. That's what we're all trying to do.

Jonathan: Of course, yes. What are some of the most common mistakes that you see beginners make with Kubernetes? You've talked about stateful applications, maybe that's one of them.

James: I think the messaging about that has been good. I don't see too many folks starting with stateful applications in Kubernetes. I wouldn't say that's the most common mistake. I think one of the things that happens very quickly in a Kubernetes deployment is that it becomes this, trash can is the wrong word, but this dumping ground. Like, "Here, I'm going to sprinkle this deployment here, this is going to go over here." Very quickly it's like, this cluster is fully resourced. All my CPU requests are taken up, and I can't deploy anything else to this cluster.

At the same time, my nodes are running at 8% CPU and 5% memory and I can't put another thing on there. That is absolutely the most common thing that I see beginning teams do in a Kubernetes environment. It's like, yes, I wanted to request two gigs of memory and three cores for myself. Then it's like, wait, my application isn't actually using this. There's two ways to tackle that. There are tools that help you solve this. I have a colleague who has actually worked very closely with Vertical Pod Autoscaler project. That's a great one that's tackling that specific problem.

Because it automatically detects what resources is your application actually using and scale, and provides recommendations to how you should be using that. I think it's a bigger problem of process and actually sharing that information with the team that's utilizing the Kubernetes cluster. If everyone has their own name space, and everyone has to deploy their own Prometheus instance in there and deploy their own-- you can start to have a cluster where you've got so much stuff repeated in there, you've got so many deployments that you're not sure exactly what's going on.

Kubernetes provides you this capability of infrastructure as code, declaring your application as code, and you've got to utilize that. You can't just say, I've got a bunch of YAMLs now, everything is great. You've got to say, this is my environment management technique. These are the resource limits that I'm setting. Maybe there's defaults that you have out in the clusters that are being applied. You have to have a good way of reviewing that consistently, whether there's open source projects, let's say, like the descheduler is a great one for node image like rearrangement.

There's Vertical Pod Autoscaler, like I mentioned. You have to make sure that you take that capability of declaring infrastructure and make sure that actually maps to what's out there, because I can have a bunch of YAMLs in a repository, and maybe that doesn't even reflect the actual state of the cluster because I haven't applied the updates, or someone deployed something out to that cluster and it wasn't reflected in the code itself. There's lots of ways to lose that map between what I expect and what's actually out there. You got to keep a handle on that.

That's where projects, obviously like building Helm charts, projects like ArgoCD to automatically roll that stuff out to clusters. All that life cycle of Kubernetes deployment capabilities and tools are all built for just that purpose. To know that I'm going to make a change to my YAML, I'm going to update my image, and I'm going to know where that is in its deployment life cycle. I'm going to know what clusters has this been applied to, what clusters hasn't it been applied to. That untenable cluster is absolutely the most common problem that I see out there.

Jonathan: How would you say price figures in to the Kubernetes equation?

James: That's a great question. My firm worked closely with a company that was price sensitive because they did a great job tracking things and saw that their billing had increased somewhat dramatically. It's a problem that I've seen tackled well. Obviously we've talked about the autoscaler capabilities, we talked about that this idea of resource management and all that, those are all important when it comes to the cost story. In the end, it's bigger than that because you have to-- when it comes to the cloud provider Kubernetes implementations, you have to understand what that base level of cost is, per cluster.

For example, I'll just give you an example that I know well because I've dealt with it, is that in Azure, there's no per cluster costs in the base AKS service, but you have to run a default node group with a node of a certain size for the system node pool. That's going to be there, and you can put other resources on it, but that's a base level of cost. You're not going to scale that cost down to zero. Then at the same time, now we think about scaling up. When it comes to scaling up, it's not as simple as saying, my node, my cost capabilities are going to scale exactly with my resource usage.

Because you have to tune those parameters for how are things going to scale. There's a bunch of different projects that are trying to address this problem. The key to a project allows you to really closely tie the scaling of your application to specific metrics, whereas a Horizontal Pod Autoscaler can depend on specific metrics, but by default out of the box is just based off of the CPU utilization of a pod. You have to build autoscaling that's going to be in line with how your application needs to grow with respect with its compute needs. When it comes to cost management, you just have to be aware of the types, the allocation of compute that you need in your Kubernetes environment.

It can be very effective in cloud provider consumption. If you need really hefty machines for short term processing. That is a great use case for Kubernetes in a team that maybe doesn't have stateless applications, but it has jobs that spin up in a short term and spin those things down on maybe large scale processing units or something like that. That's where a Kubernetes can be really successful in terms of cost management, because you can set up node selectors and taints and tolerations to say, I can have-- In most implementations nowadays, I can say my GPU worker pool is going to scale to zero, and it's only going to show up when I need, when I need that compute.

That's a very effective use of cloud provider resources. Then when it comes to stateless applications that are just scaling with respect to requests or utilization, things like that, I think the biggest thing we've seen from the cost management perspective is having a good understanding of what's in the cluster and what you're actually putting in the Kubernetes environment is the biggest when you can have in a cost management perspective.

Yes, when you get get down the road, you can really finally tune that scaling up and down, but you can't do that well unless you know that allocation level, you know that you actually need new nodes, and you're not just getting yourself a new node when your existing worker nodes have, like I said, that 5% CPU and 10% memory utilization. Doing that stuff well early on will help you manage costs as your application grows and your utilization grows.

Jonathan: If you could no longer use Kubernetes, what would you miss the most?

James: Oh man, that declarative API for saying this is what I want to be running for my compute. It's just such a base level of how I think about applications these days, is I have a deployment for the compute, I have a service for the networking, I have an ingress for bringing traffic in. It's just that those simple declarations make it so easy to express how my application is working, and then see that show up. I can apply, I can Helm install or Kubetail apply, and see that show up on the other end all with a simple set of calls. It's the thing that I like the most about deploying applications on Kubernetes.

That first team that I worked with back in 2016, we migrated a huge Java EE implementation. I won't say which one just out of kindness. A huge Java Enterprise stack that you had to log into a console and you would you could see the versions of the applications because we had put it as a specific annotation. You couldn't see anything about-- you had to go navigate down to see, here are the jars that we have active right now, and it was just-- Oh man. There's plenty of other good options now, Lambdas and things like that that you could do. Man, that's the thing that I compare it to.

Jonathan: I have a two-part question here. Let's start with we've been talking about teams. Let's start with that version of the question. How should a team get started if they're interested in adopting Kubernetes, or maybe just experimenting with it before they make the full commitment? How do you get started?

James: I think in terms of getting started, the prerequisite is that you have your applications running in containers. You've got to make sure that you got your applications containerized. You can run those things in Docker or something like that. Maybe as you're starting your journey, maybe you have EC2s and you deploy your applications and Docker on EC2 as a starting point to then take that next step into Kubernetes. Then once you have your application containerized, if you're just getting started, you're just learning Kubernetes, use those cloud provider implementations wherever you are.

Spin one of those things up. Get your deployment out there and get your application running in a Kubernetes environment. That is something that you can do. If you've built your application to a container, you can get a Kubernetes cluster up, and you can get your application running very quickly. The steps to that are you'll want to get an ingress controller deployed. I would say if you're getting started, get Ingress-NGINX out there, and define a deployment, a service, and an ingress object for your application. Then get two or three different services out there and have them talk to one another.

That's the first step, you get an Ingress controller, you get those things going. Now, the next step for me is one of those things that Kubernetes provides is that knowability, understanding what's going on. A service like Prometheus piping metrics to Grafana is really helpful from my perspective as you're getting started, because Grafana allows you to visualize those metrics and you can use a tool like node exporter to get the container metrics and really easily start to see what's going on out there from an actual compute perspective.

I've got my applications orchestrated running, and now I'm seeing what's going on with them with Prometheus and Grafana. That's the place that you can start to build from because you can get your services exporting additional metrics, you can deploy new services out there, you can start to start to add things to your cluster. Then the next question after that, to me, depends on where you're going. Maybe you're security conscious. We should talk about security as a whole other thing. If you're security conscious, you have to start thinking about your runtime security.

Maybe you deploy a product out there like Falco or Tetragon is a new one from Isovalent that was released as open source, that starts to understand what's going on the runtime of the node. There's a whole category of have you ever thought about set comp policies in your containerized applications, which is, if you're security conscious in your deployment, you have to think about those sorts of things, and you have to think about it early because you can't just-- If you are at all security conscious, you could-- Kubernetes provides you with a really easy foot gun of saying, "Here, I've got my containers running out here."

At the same, it just created a really easy way to run everything as root and open up a bunch of security holes. That's one path you might have to take early on in your Kubernetes implementation. It's one of the things that some of the Kubernetes providers are offering as their real advantage, to say, "We've thought about this for applications, and we have this set of solutions for securing a container runtime." That's one path for starting out on Kubernetes. After you've got a bunch of applications out there, you have to think about how you're going to manage the life cycle and their configuration management.

You've got to pick a tool that is doing that for you. Now, if you're just starting out, using Kustomize is great. I think Helm is honestly a great place to start for applications that are-- I just think, for me, Helm is just so easy in terms of taking those base Kubernetes resources of the standard starting point, Ingress service and the deployment, and orchestrating that across multiple environments to say, here are my values for my dev cluster, here are my values for the prod cluster. That would be the next place I would go, whether it's Helm or Kustomize, something like that to manage the declarative state of your application.

That's a path that you go down to say, yes, I can say here's what's running in dev, here's what's running in prod, and I can see that in a repository. That is a early question that you want to be able to answer because-- If you want to start this with just a set of manifests and a repository, that's a great place to start as well. You just have to understand that there's certain things that manifests and a repository in Kube kernel, applying them aren't going to do for you. Removal of resources and updating certain fields might not be possible with just a Kube kernel apply. There's certain restrictions there that you have to understand.

The next thing you have to think about is within that management process is how are you managing those updates? You have your container, you've built your application. When you start with Docker you've built to latest and you've pushed the latest tag to your repository. You have to answer the question of, how am I going to understand what's running in dev versus what's running in prod? Even for a small team, hopefully you probably have a dev or a staging environment, and then a production environment. How do you want to tag those images that you're building?

You can start by tagging with build numbers and whatever your build system is, or you can tag with a hash, and then the container image has a hash when you push to the repository. Then you can shift a tag onto that hash, or you can deploy the image hash directly. Every container image has a shot 256 hash. What the security conscious folks are saying these days is you want to deploy exactly that shot 256 of your image into your cluster, because that's the way that you know exactly what you're running out there, rather than using a tag.

That's not where I would start. I would start with tagging with a build number and then potentially adding another tag if you want to, or depending on your implementation, what you can do on those tags, whether you can add an annotation or something like that. You can just append a dev tag onto one of your image, and then when you push to prod, just push that prod tag onto that image number as well. That's another way you can do it. You need some way to manage that versioning life cycle of the set of applications you've pushed out to your cluster.

Jonathan: Great. A related question, as an individual, maybe an engineer is interested in learning Kubernetes, maybe not at their current job, but they want to learn it to extend their career options, how should an individual get started, especially if they don't have access to an employer who's paying them to learn Kubernetes on the job?

James: Or someone that's paying their cloud bills. That's a dangerous one. Kubernetes is so accessible for people on a local development machine these days. I use the Colima project on my MacBook. They have M1 support, which is great. There's a bunch of projects that are out there that make it really easy to run Kubernetes on your own machine, with a bunch of great resources out there in the community to get started with that, whether it's Kind, or MicroK8s, or like I said, Colima does that, Minikube.

That's honestly where I would start if I was starting today, is get Kubernetes running there, figure out what little applications I want to run, or even just get a cluster up, put an Ingress controller on it. Put some metrics on it so you can see what's going on. Then deploy a "Hello World" and whatever language you like to use. Hit it a few times and see what that looks like, and then start to build on top of that. Once you have an implementation, a Kubernetes that works on your machine, maybe it's just a single node, but what I found these days is that I reason about nodes way less.

Kubernetes makes that so easy to think about the distributive nature of that. Yes, there are problems where you have to know where the nodes are and you have to know whether your application is distributed across availability zones and things like that, but for the getting started use case, a single node on your machine where you can deploy a couple applications is a great starting point. I would encourage people that are just starting to learn about that to pick one that is easy for you. You can get into a bunch of complexity on these local implementations, and also you can avoid it.

What I would say is avoid that complexity as much as possible, get images working, get images up and running, and see what that looks like. See what it looks like to make Kube kernel call. See what it looks like to Helm install, and Helm upgrade, and Helm uninstall. Those types of things get you comfortable with Kubernetes and get you comfortable with working with the API. Then you can build on top of that as you as those needs arise. Once you have that comfort level with the API, then you can go and deploy more complicated applications and get a cloud provider and get going with that.

Jonathan: Good advice. I don't think we've covered anywhere close to all the topics about Kubernetes, but we've talked for close to an hour, three quarters of an hour. Is there anything important that I should have asked that I didn't, or anything that you think that we should be sure to cover before we close?

James: I would be remiss if we didn't just talk about the level of security that it's really important to understand that there are layers of security that are important when it comes to any Kubernetes deployment. You have to understand the surface area, the attack vectors that are possible in your Kubernetes environment. You are exposing an API that if someone gets access to and can submit an authenticated request that gets authorized and run, they can run privileged containers, run something as root on a node of your environment.

You have to make sure that you're running secure versions of the Kubernetes API server. A number of Kubernetes versions back, there was that pod exec bug that allowed anyone to submit that pod exec request against the API server. You have to be aware of that service area, you have to be aware of your image service area. That's one of the things where there's been a ton of investment lately in terms of delivering SBOM with a project like Gripe, is a great project because you build your container image and then Gripe can figure out if there's any CVEs in that container image, and it'll print that out for you.

There's a tool in that ecosystem as well that can generate the SBOM for you and deliver it on with your application. Then there's been all this work recently with sigstore and the projects around that around signing container images and ensuring that appropriately signed images are running in your Kubernetes clusters. That's one set of problems is image security. Then there's another set of problems of node-level security. What kernel version are you running?

These are problems that we have in the EC2 space that remain problems. It's just far easier in the Kubernetes environment to say I'm going to remove a node, and maybe I may even label that node with the kernel version that I'm running. I can query that and I can remove those nodes and spin up new nodes on a patched kernel, and all that. There's all of that service area that you have to understand, and you have to make sure that you're adequately planning for the risk across all of those layers.

There's a ton of interesting work going on across the different layers of this. Like I said, sigstore and those projects have gotten a lot of buzz lately. There's also really cool stuff going on at the kernel level in terms of securing that communication between the application run time and the actual node kernel that you're running on. If you're getting started with Kubernetes, like we had a earlier conversation, just don't run your containers as root on Kubernetes. That's a great starting point. I would be remiss without just saying that on this episode.

Jonathan: Nice. If people are interested, if they have questions, are you available for inquiry?

James: Absolutely.

Jonathan: For hire?

James: All of the above, yes. I'll send my Twitter handle, James McShane. Love interacting with the Kubernetes community on Twitter. There is a community on Twitter now as well, that's great. There's a bunch of great people in there. I'm in the hang-up Slack which is a great Slack. There's a lot of conversation there. Then as you said, I work for a consulting firm, SuperOrbital. We do Kubernetes training as one of our main offerings, as well as consulting. We have a bunch of people who are really-- Our focus is to bring on T-shaped engineers. Each one of our people has an area that they're really strong in.

If you've got an interesting Kubernetes problem, we probably have somebody who's thought about it deeply and has worked with the community on it. I think that's one of the biggest things about the Kubernetes ecosystem, is that, yes, there is all this complexity, it's somewhat fractured into lots of different small communities, but there's really great information out there. If you get to know the folks that are there, you interact over Slack or over Twitter, or Discord, or something like that, you can get great information, you can get great advice about getting started, and you can meet awesome people, which is how I met you. That's been great.

Jonathan: Awesome. [chuckles] Your training, is it virtual or it's all on location?

James: Our training offerings, they're virtual and they're classroom-based. It's a virtual classroom that we offer. It's not like ad hoc, but rather you would come into a classroom and interact with an instructor, and do things hands-on. That's one of the key tenets of our training offering, is we make sure that people get hands-on with the tools and technologies, and work with things that we teach about. We go back and forth between a lecture and a lab and make sure that's something they get hands-on with.

Jonathan: Great. If you're listening and you're interested in Kubernetes now, and maybe you need some training on the topic, you have the resources now. Thanks so much, James, for coming on. I really appreciate it, it's been an educational and fascinating topic. I appreciate you taking the time. I suppose we'll stay in touch. Thanks a lot.

James: Thanks, Jonathan. This was a wonderful conversation. I really appreciate taking the time today to talk about Kubernetes.

Jonathan: Great.

[music]

Share this