Enhance Kubernetes scheduling for GPU-intensive apps with node templates

Kubernetes scheduling ensures that pods are matched to the correct node. cuvette you can run them.

The whole mechanism improves availability and performance, often with great results. However, the default behavior is an anti-pattern from a cost perspective. Running pods on half-empty nodes will cost you more in the cloud. This problem is exacerbated with GPU-intensive workloads.

Best suited for parallel processing of multiple datasets, GPU instances have become the preferred option for training AI models, neural networks, and deep learning operations. These tasks run faster, but tend to be more costly and can result in huge bills when combined with inefficient scheduling.

This problem was posed to one of CAST AI’s users: a company developing an AI-driven security intelligence product. Their team overcame this problem with the platform’s node templates, an autoscaling feature that improves provisioning and performance of workloads that require GPU-enabled instances.

Learn how node templates can enhance Kubernetes scheduling for GPU-intensive workloads.

K8s Scheduling Challenges for GPU Workloads

Kube scheduler Kubernetes’ default scheduler that runs as part of the control plane. Select the node for the newly created, not yet scheduled pod. By default, the scheduler tries to distribute these pods evenly.

Containers within a Pod may have different requirements, so the scheduler filters out nodes that do not meet the specific needs of the Pod.

It identifies and scores all possible nodes for the pod, chooses the node with the highest score, and informs the API server about this decision. Several factors influence this process, including resource requirements, hardware and software constraints, and affinity specifications.

Figure 1 Overview of Kubernetes Scheduling

Schedulers automate the decision-making process and deliver fast results. However, common approaches can be costly as you have to pay for resources that are not optimal for different environments.

Kubernetes is cost agnostic. Cost management—determining, tracking, and reducing costs—is the responsibility of engineers, but it’s especially acute in GPU-intensive applications as that rate can skyrocket.

Costly scheduling decisions

To better understand pricing, let’s take a look at Amazon EC2 P4d, which is designed for machine learning and high-performance computing apps in the cloud.

Powered by NVIDIA A100 Tensor Core GPUs for the highest throughput and low latency networking, supporting 400 Gbps instance networking. P4d promises a 60% reduction in the cost of training ML models and a 2.5x improvement in deep learning performance over his previous generation of P3 instances.

As impressive as it sounds, the hourly on-demand price is hundreds of times higher than the cost of popular instance types such as C6a. Therefore, it is important to precisely control the scheduler’s general decisions.

Figure 2 Price comparison between p4d and c6a

Unfortunately, when running Kubernetes on GKE, AKS, or Amazon Web Services’ Elastic Kubernetes Service (EKS), tweaking scheduler settings has minimal impact without components such as: Change AdmissionController.

This is still not a perfect solution as you have to proceed with caution when creating and installing webhooks.

A node template can help

This was exactly the challenge faced by one of our CAST AI users. The company develops AI-powered intelligence solutions for real-time threat detection from social and news media. Its engine simultaneously analyzes millions of documents to capture new stories, but also enables the automation of unique natural language processing (NLP) models for intelligence and defense.

The amount of confidential and public data used by this product continues to grow. This means that workloads often require GPU-enabled instances, incurring additional cost and work.

Much of that effort can be saved by using node pools (Auto Scaling groups). However, while node pools help streamline the provisioning process, they can be very cost-effective and you end up paying for capacity you don’t need.

CAST AI’s autoscaler and node templates improve on this by providing tools for better cost control and reduction. Additionally, thanks to the fallback feature, node templates allow you to benefit from Spot instance savings and guarantee capacity even if Spot is temporarily unavailable.

Node template in action

The CAST AI client workload now runs on a predefined group of instances. Instead of manually selecting specific instances, teams can loosely define their characteristics (e.g. “CPU optimized”, “memory optimized”, “GPU VM”, etc.) and then the autoscaler will do the rest. increase.

This feature gives us a lot more flexibility as we have more freedom to use different instances. When AWS adds new high-performance instance families, CAST AI automatically registers them, so no additional enablement is required. This is not the case with node pools, so you should keep track of new instance types and update your configuration accordingly.

By creating a node template, clients can specify common requirements such as instance type, lifecycle of new nodes to add, provisioning configuration, and so on. I also identified constraints such as instance families (p4d, p3d, p2) and GPU manufacturers (in this case his NVIDIA) that I didn’t want to use.

For these specific requirements, CAST AI found 5 matching instances. The autoscaler follows these constraints when adding new nodes.

Figure 3 Example Node Template with GPU Enabled Instances

The autoscaler automatically decommissions GPU-enabled instances once the GPU job is complete.

Additionally, thanks to Spot Instance automation, our clients can save up to 90% on their hefty GPU VM costs without being negatively impacted by Spot interruptions.

GPU spot prices can vary widely, so it’s important to pick the best one at the moment. CAST AI spot instance automation addresses this. It also helps ensure the right balance between the most diverse types and the cheapest types.

Also, on-demand fallback is useful in case of mass Spot disruptions or low Spot availability. For example, if the training process of a deep learning workflow is interrupted and not stored properly, significant data loss can occur. If AWS decommissioned all EC2 G3 or p4d spots your workload was using at once, the automatic fallback could save you a lot of hassle.

How to create a node template for your workload

Creating a node template is relatively easy and can be done in three different ways.

First, use the CAST AI UI. If you’ve already connected and onboarded to your cluster, it’s easy. Enter your product account and follow the on-screen instructions.

After naming the template, you must choose whether to pollute new nodes and avoid assigning Pods to them. You can also specify custom labels for nodes created using templates.

Fig. 4 CAST AI node template

You can then link the template to the relevant node configuration, but you can also specify whether the template uses only Spot nodes or only On-Demand nodes.

You can also choose your processor architecture and the option to use GPU-enabled instances. When you choose this setting, CAST AI automatically runs your workloads on relevant instances, including new families added by cloud providers.

Finally, you can also use a limit like this:

optimized for computing: Helps you select instances of apps that require high-performance CPUs.
Storage optimization: Select instances of your app that benefit from high IOPS.
Additional Constraintsinstance family, minimum and maximum CPU and memory limits, and more.

But the hard truth is that the fewer constraints you add, the better the match and the greater the cost savings. CAST AI’s engine will handle it.

You can also use Terraform to create node templates (you can see all the details) in GitHub) or use the API ( documentation).

summary

Scheduling in Kubernetes can be challenging, especially for GPU-intensive applications. Schedulers automate the provisioning process and provide fast results, but they are often too generic and too expensive for your application’s needs.

Using node templates improves performance and flexibility for GPU-intensive workloads. This feature also allows autoscalers to decommission GPU instances when they are no longer needed and acquire cheaper options for new workload requirements.

We have found this quality to help us build AI apps faster and more reliably. And I hope it helps you in your efforts as well.

Žilvinas Urbonas is the Engineering Manager and Founding Engineer at CAST AI. He has experience implementing backends and frontends for many different types of systems, leveraging his best practices. A few hours later he goes for a run and hops on…

Enhance Kubernetes scheduling for GPU-intensive apps with node templates

K8s Scheduling Challenges for GPU Workloads

Costly scheduling decisions

A node template can help

Node template in action

How to create a node template for your workload

summary

Leave a Reply

RECENT POSTS

A Guide to Loop Engineering: How “Automated Research” and “Bi-Level Automated Research” Turn AI Agents into Autonomous Machine Learning ML Research Loops

5 numbers that show how contact centers are leveraging AI

Iran releases AI video depicting death of US senator Lindsey Graham

K8s Scheduling Challenges for GPU Workloads

Costly scheduling decisions

A node template can help

Node template in action

How to create a node template for your workload

summary

Related Posts

Leave a Reply