Akamai launches AI grid intelligent orchestration for distributed inference across 4,400 edge locations

Machine Learning


Akamai Inference Cloud is the industry’s first global implementation of NVIDIA AI Grid, intelligently routing AI workloads across edge, regional, and core footprints to balance latency, cost, and performance.

Akamai Technologies has reached a major milestone in the evolution of artificial intelligence, announcing the first global implementation of the NVIDIA AI Grid reference design. By integrating NVIDIA AI infrastructure into Akamai’s infrastructure and leveraging intelligent workload orchestration across the network, Akamai intends to move the industry from isolated AI factories to a unified, distributed grid for AI inference.

This move marks an important step in the evolution of Akamai’s Inference Cloud, which was introduced late last year. Akamai, the first to operate an AI grid, deploys thousands of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, providing a platform that enables enterprises to run agent and physical AI with the responsiveness of local computing and the scale of the global web.

“AI Factory is purpose-built for training and frontier model workloads, and our centralized infrastructure continues to deliver the best tokenomics for these use cases,” said Adam Caron, Akamai’s chief operating officer and general manager of the cloud technology group. “But real-time video, physical AI, and highly concurrent personalized experiences require point-of-touch inference, rather than round trips to a centralized cluster. Our AI Grid intelligent orchestration gives AI factories a way to extend inference externally. Leveraging the same distributed architecture that revolutionized content delivery, we route AI workloads across 4,400 locations at the right cost and at the right time.”

Also read: AiThority Interview with Glenn Jocher, Ultralytics Founder and CEO

Architecture of “Tokenomics”

At the heart of the AI ​​Grid is an intelligent orchestrator that acts as a real-time broker for AI requests. Applying Akamai’s expertise in application performance optimization to AI, this workload-aware control plane optimizes “tokenomics” by radically improving cost per token, time to first token, and throughput.

Akamai’s key differentiator is that customers have access to fine-tuned and diluted models through its massive global edge footprint, which provides significant cost and performance benefits for the long tail of AI workloads. for example:

  • cost efficiency at scale: Enterprises can significantly reduce inference costs by automatically matching workloads to the appropriate compute tier. The orchestrator applies techniques such as semantic caching and intelligent routing to route requests to appropriately sized resources and reserve premium GPU cycles for the demanding workloads. This is powered by Akamai Cloud, built on open source infrastructure with rich egress capacity to support data-intensive AI operations at scale.
  • Real-time responsiveness: Game studios can offer AI-driven NPC interactions that maintain player immersion down to the millisecond. Financial institutions can perform personalized fraud detection and marketing recommendations from the moment you log in until the first screen appears. Broadcasters can transcode and dub content in real time for audiences around the world. These achievements are made possible by Akamai’s globally distributed edge network of more than 4,400 locations with integrated caching, serverless edge computing, and high-performance connectivity that processes requests at the point of user contact, avoiding origin-dependent cloud round-trip delays.
  • Production-grade AI at the core: Large-scale language models, continuous post-training, and multimodal inference workloads require sustained high-density computing that only dedicated infrastructure can provide. Akamai’s thousands of GPU clusters, powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, deliver concentrated horsepower for the heaviest AI workloads, complementing the distributed edge with centralized scale.

The computing continuum: from the core to the far edge

Built on NVIDIA AI Enterprise and leveraging NVIDIA Blackwell architecture and NVIDIA BlueField DPUs for hardware-accelerated networking and security, Akamai can manage complex SLAs across edge and core locations.

  • The Edge (4,400+ locations): Achieve rapid response times for physical AI and autonomous agents. Leverage semantic caching and serverless features such as Akamai Functions (WebAssembly-based computing) and EdgeWorkers to provide model affinity and consistent performance at the point of user interaction.
  • Akamai Cloud IaaS and dedicated GPU clusters: Core’s public cloud infrastructure enables large-scale workload portability and cost savings, and pods powered by NVIDIA RTX PRO 6000 Blackwell GPUs enable heavy-duty post-training and multimodal inference.

“New AI-native applications require predictable latency and greater cost efficiency on a global scale,” said Chris Penrose, global VP of communications business development at NVIDIA. “By operating the NVIDIA AI Grid, Akamai is building the connective tissue for generative, agent, and physical AI to move intelligence directly into data and unlock the next wave of real-time applications.”

Powering the next wave of real-time AI

Akamai is already driving strong early adoption of Akamai Inference Cloud across compute-intensive and latency-sensitive industries.

  • game: Studio introduces sub-50ms inference for real-time player interaction with AI-driven NPCs.
  • financial services: Banks are leveraging the grid to deliver highly personalized marketing and quick recommendations at the critical moments when customers log in.
  • media and video: Broadcasters use distributed networks for AI-powered transcoding and real-time dubbing.
  • retail and commercial: Retailers are adopting networks for in-store AI applications and related productivity tools at the point of sale.

The platform is driven by enterprise demand and has also been validated by leading technology providers, including a $200 million, four-year service agreement for thousands of GPU clusters in data centers purpose-built for MetroEdge’s enterprise AI infrastructure.

Scaling your AI factory from centralized to decentralized

The first wave of AI infrastructure was defined by large clusters of GPUs in a few centralized locations, optimized for training. But as inference becomes a dominant workload and companies across all industries focus on building AI agents, that centralized model faces the same scaling constraints that previous generations of Internet infrastructure faced in media distribution, online gaming, financial transactions, and complex microservices applications.

Akamai solves each of these challenges through the same fundamental approach: distributed networking, intelligent orchestration, and purpose-built systems that bring content and context as close as possible to digital touchpoints. As a result, companies that have adopted this model have improved user experience and increased ROI. Akamai Inference Cloud applies the same proven architecture to AI factories, distributing dense compute from the core to the edge, enabling the next wave of scaling and growth.

Also read: The infrastructure war behind the AI ​​boom

[To share your insights with us, please write to psen@itechseries.com ]



Source link