ATLANTA — Kubernetes AI has prompted a series of updates and new projects within the Cloud Native Computing Foundation this year. These projects are designed to help platform engineers keep up with the breakneck speed of AI development.
Growing connectivity between Kubernetes, other cloud-native infrastructure projects, and AI was a prominent theme this week at KubeCon + CloudNativeCon North America 2025. Key topics include the increasing use of cloud-native technologies by backend developers supporting AI engineers, the rapid adoption of open source AI tools, and efforts by the Cloud Native Computing Foundation (CNCF) to establish standards for Kubernetes AI.
In another conversation thread, we followed up on requests from platform engineers at KubeCon 2024 for improvements to Kubernetes AI, including smoother cluster upgrades, in-place node resizing, new resource scheduling for GPUs, and more sophisticated and reliable resource scheduling for framework orchestration.
“Kubernetes was on a typical adoption curve a few years ago, and then there was a big twist in the AI era that kind of stuck us a little bit a few years ago,” Jago McLeod, Google’s director of engineering for Kubernetes and Google Kubernetes Engine (GKE), said in his keynote. “We were in the midst of a real transformation in this space.”
Jago McLeod, Director of Kubernetes and GKE Engineering at Google, will discuss recent Kubernetes improvements in his KubeCon keynote.
K8s Upgrade: “Rollback is finally here”
One of the most common complaints among platform engineers over the years is that Kubernetes cluster upgrades are too difficult to manage and difficult to undo if something goes wrong, and McLeod said this problem has worsened as generative AI workloads have caused significant infrastructure growth, requiring more frequent upgrades.
A new approach to Kubernetes rollbacks for minor version upgrades, delivered upstream by Google and available on GKE, introduces a two-step process for upgrades. According to Google’s blog post, this new process saves an emulated version of the previous control plane, making it easier to revert changes without disrupting service. There will also be support for skipping upgrades, rather than requiring users to keep up with Kubernetes updates every three years, McLeod said.
“This has literally taken 10 years of effort,” he said. “We knew early on that we needed to do this, but it was very difficult to do. So now the rollback is in full swing.”
DRA sets up improved node management
Dynamic Resource Allocation (DRA), another initiative that gained momentum during KubeCon + CloudNativeCon North America two years ago, reached a stable state with Kubernetes 1.34 released in August. DRA is part of an effort to change Kubernetes’ “relationship with hardware,” McLeod said. Because AI workloads require more efficient use of expensive and scarce GPU resources than CPU-based systems, DRA allows Kubernetes pods to share specialized hardware more flexibly and on a smaller scale.
Another important upstream effort in Kubernetes node management is in-place pod resizing (IPPR). It was released in beta version 1.33 in April. IPPR supports replacing CPU and memory resources allocated to containers without requiring a pod restart. In the past, distributed web applications commonly hosted on Kubernetes could tolerate such restarts. However, model training and inference workloads are much more sensitive to such interruptions.
Currently, Kubernetes shows little to no termination warnings or grace periods of a few seconds, which is no fun, especially if you have large jobs that don’t finish within the grace period.
lucy sweetengineer, uber
Lucy Sweet, an engineer at Uber and chair of the Kubernetes Node Lifecycle Working Group, said in an interview with Informa TechTarget that IPPR will serve as the foundation for further advancements in resource scheduling for Kubernetes nodes, including a vertical pod autoscaler that automates in-place pod resizing.
“Even after we set the schedule, we’ve also been thinking about how to deal with disruption,” Sweet said. “We have a new standard called eviction requests that guarantees controlled termination. So if you’re running a training job, if you checkpoint before it’s terminated, you’ll get a warning. Currently in Kubernetes, you get little to no warning about termination, and there’s a grace period of a few seconds. That’s not fun, especially if you have a large job that doesn’t finish within that grace period.”
Beyond the nodes: Framework-aware orchestration
Kubernetes maintainers have been working with members of the High Performance Computing (HPC) project, such as Slurm and Ray, to make the Kubernetes scheduler aware of job dependencies and resource requirements for HPC workload orchestration frameworks.
“We are beginning to see a shift from automation to autonomy. [both] “This is really just a logical extension of declarative APIs and decoupled controllers becoming smarter,” McLeod said.
In the longer term, two new CNCF sandbox projects, Kubernetes AI Toolchain Operator (KAITO) and KubeFleet, will simplify running AI inference on Kubernetes and offer new ways to manage multiple clusters on a global scale, according to Thursday’s keynote by Microsoft Principal Software Engineer Jeremy Rickard.
“KAITO is probably the easiest way to run and deliver AI inference on Kubernetes. [using] “It’s a workspace structure where you can pass the model and defer all GPU provisioning tasks to the GPU provisioner,” Rickard said. “You can also use nudges and RAGs to ground your model.” [retrieval-augmented generation] Features and parameters it offers. ”
Beth Pariseau, senior news writer at Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Any tips? send an email to her or reach out @Parisaud.