Managing the network for AI applications

Watch it on Demand now!

Presenter:

Muninder Singh SambiVP of Product Management, Google Cloud Networking

Neela ShahPrincipal Software Architect, NVIDIA

overview

As enterprises operate more in hybrid and multi-cloud environments, data is becoming more decentralized. At the same time, enterprises are rapidly adopting multi-cloud for AI/ML and expect to deploy generative AI within the next three years.

To meet the new requirements brought about by AI while maintaining business agility, traditional technologies are no longer sufficient; enterprise networks must be secured, streamlined, and extended for AI applications.

Google Cloud offers the most comprehensive and efficient AI infrastructure, providing customers with unmatched scale, consistency, choice, and flexibility. Google Cloud Networking ensures low latency and high bandwidth for AI/ML traffic, and GKE and Model-as-a-Service endpoints enable seamless access to models across different environments. Leveraging these capabilities, enterprises can realize the full potential of AI/ML, accelerate innovation, and achieve tangible business outcomes.

context

Speakers discussed the evolution of AI-driven technologies and shared five key recommendations for delivering AI applications.

Key Takeaways

Gen AI is driving a fundamental evolution in network technology.

As more enterprises adopt Gen AI, demand for GPUs is soaring, but new technology brings new challenges: traditional web applications and Gen AI applications have different traffic patterns and therefore different network requirements.

Traditional Web Apps	Gen AI App
Small Request/Response Process requests as soon as they arrive Processing time is in milliseconds Static content is cacheable	Very large requests/responses due to multimodal traffic A single LLM query consumes 100% of the GPU/TPU computation time. The request must wait for available compute Processing times vary from a few seconds to a few minutes

Harnessing the power of Gen AI requires fundamentally different technologies: Advances in large-scale specialized computing infrastructure, low-latency communication fabrics, and high-bandwidth data transfer are required to enable AI use cases.

“in [AI] The use cases require the network to function. The network is the key driver to pave the way.”

– Muninder Singh Sambi, Google Cloud Networking

Figure 1: AI applications have different use cases than traditional applications

Google offers five key recommendations for managing networks for AI workloads:

Through our long-standing partnership with industry leader NVIDIA, we are developing innovative technologies to power Gen AI applications, simplifying and speeding up infrastructure to keep up with ever-changing requirements.

Our collaboration with NVIDIA has not only delivered significant advances in AI technology, but also provided a wealth of experience from which we have derived five key recommendations for network management for AI workloads.

1. Establish a scalable network fabric built and optimized for AI/ML infrastructure

The four stages of an AI/ML pipeline are data ingestion, data preparation, model training, and inference. The underlying network infrastructure plays a critical role at each stage in ensuring performance, latency, and cost efficiency. A high-capacity, non-blocking data center network optimized for training is essential to optimize job completion times when performing model training.

“By optimizing network infrastructure at each stage of the AI/ML data pipeline, enterprises can significantly improve overall model development and deployment efficiency, resulting in faster time to market and improved user experience.”

– Muninder Singh Sambi, Google Cloud Networking

Figure 2: Scalable network for Gen AI

2. Train AI across clouds with the Google Cross-Cloud Network

Training and inference use cases depend on data ingestion. But data must first be moved from various locations (on-premise, other clouds, etc.) to the AL/ML cloud infrastructure without compromising security. The cross-cloud network provides low-latency, highly reliable, hybrid, multi-cloud connectivity backed by SLAs to move data quickly and securely.

Figure 3: Google Cross-Cloud Network enables AI training across clouds

3. Protect your AI workloads, data, and users.

To mitigate risk with Gen AI, it is essential to implement comprehensive and pervasive security at every layer of the network. Businesses need high security effectiveness and strong network controls that are easy to use. Our security solutions are built to deliver security at scale in a constantly changing threat landscape, delivering 20x greater effectiveness than existing alternatives for both data at rest and in motion.

“From the application layer all the way to the GPU, there is no longer a single place where you can expose your data or models. [security] The hardware base is really exciting.”

– Neelay Shah, NVIDIA

Figure 4: Zero Trust security for workloads, data and users is critical for AI applications

4. Optimizing traffic management through AI inference load balancing

The multimodal nature of Gen AI inference workloads and variability in request and response times pose unique challenges to network management. Traditional round-robin or utilization-based traffic management is not optimal for Gen AI-based inference. Google Load Balancing solves this challenge by distributing traffic based on a customizable queue depth metric, reducing average and peak latency while improving GPU efficiency to reduce costs.

Figure 5: Google provides advanced load balancing capabilities for AI

5. Operationalize with AI-powered Gemini Cloud Assist

Increase operational efficiency with AI-powered Gemini Cloud Assist. From design to build, Google Gemini helps optimize operations with guidance on best practices to reduce costs, improve security, and troubleshoot advice. Run your network with Gemini Cloud Assist to accelerate your cloud migration.

Figure 6: Gemini Cloud Assist helps operationalize AI

learn more

Check out Google Cloud and Nvidia

Biography

Muninder Singh Sambi

Muninder Singh Sambi is the Vice President of Network Product Management at Google Cloud. He is responsible for defining the strategy, vision and investment strategy for Google Cloud's Network and Network Security products. Muninder has over 25 years of leadership experience in product management and engineering in global high-tech environments. He has a proven track record of building next-generation security, network and software-as-a-service products that have generated billions of dollars in revenue. He is a technical leader skilled in managing product transitions and leading cross-functional global product management and outbound marketing teams. Muninder holds a degree in Electronics Engineering and Communications and an MBA. He holds over 8 US patents in the field of networking and security.

Neela Shah

Neelay Shah is a principal software architect and AI solutions engineer for the NVIDIA Triton Inference Server. He focuses on enabling developers to smoothly move from prototype to large-scale, high-performance production. Prior to joining NVIDIA, he was a principal engineer at Intel, where he led the Computer Vision Pipeline open source project. He holds a BS in Computer Science from Williams College and an MS in Computer Science from UIUC.

Source link