Cisco Lays Foundation for AI Network Infrastructure

Cisco is developing a new high-end programmable Silicon One processor aimed at powering large-scale artificial intelligence (AI)/machine learning (ML) infrastructure for enterprises and hyperscalers.

The company has added the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202 to its currently 13-member Silicon One family. They can be customized for routing or switching from a single chipset, eliminating the need to use different silicon architectures for each network function. This is accomplished through a common operating system, P4 programmable transfer code, and SDK.

According to Rakesh Chopra, Cisco Fellow of the vendor’s Common Hardware Group, this new top-of-the-line device in the Silicon One family is the ideal network for demanding AI/ML deployments and other highly distributed applications. Provides enhanced functionality.

“We are going through this massive shift in the industry, where what seemed massive at the time was nothing compared to the absolutely massive adoption required for AI/ML, but this We were building a reasonably small, high-performance computing cluster for the species,” Chopra said. AI/ML models went from requiring a few GPUs to requiring tens of thousands of GPUs linked in parallel and serially. “The number of GPUs and scale of the network is unprecedented.”

New Silcon One enhancements include the P4 programmable parallel packet processor capable of over 435 billion lookups per second.

“We have a fully shared packet buffer, and every port has full access to the packet buffer, regardless of what is going on,” Chopra said. This is in contrast to assigning buffers to separate input and output ports. In other words, the buffer you get depends on the port the packet is sent on. “That means less ability to write due to traffic bursts, more likely to drop packets, and significantly lower AI/ML performance,” he said.

Additionally, each silicon device can support 512 Ethernet ports, allowing customers to build 32K 400G GPU AI/ML clusters with 40% fewer switches than other silicon devices required to support that cluster. Chopra said.

Core to the Silicon One system is support for enhanced Ethernet features such as improved flow control, congestion awareness and avoidance.

The system also includes advanced load balancing features and “packet spraying” that spreads traffic across multiple GPUs or switches to avoid congestion and improve latency. Hardware-based link failure recovery also helps ensure the network operates at peak efficiency, the company said.

Taking these enhanced Ethernet technologies together to go a step further, ultimately customers will be able to set up what Cisco calls a fully scheduled fabric.

In a scheduled fabric, physical components (chips, optics, switches) are tied together like one big modular chassis and communicate with each other to provide optimal scheduling behavior, Chopra said. “The end result is that flows like AI/ML, in particular, will have much higher bandwidth throughput and much faster job completion times, which means GPUs will run more efficiently. .”

With Silicon One’s devices and software, customers can deploy as many or as few of these features as they want, Chopra said.

Cisco is part of a growing AI networking market that includes Broadcom, Marvell, Arista, and others, which will grow from $2 billion today to $10 billion by 2027, according to the 650 Group’s recent blog. expected to reach.

“AI networks have already flourished in the last two years. In fact, we have been tracking AI/ML networking for almost two years and AI/ML is a huge opportunity for networking, and our forecast is that data We believe it will be one of the main drivers of the growth of Center Networking,” said 650 Blog. “Key to the impact of AI/ML on networking will be the sheer amount of bandwidth required to train AI models, new workloads, and powerful inference solutions coming to market. And many industries will have multiple digitization efforts thanks to AI.”

According to Chopra, the Cisco Silicon One G200 and G202 are currently being tested by unidentified customers and are available on a sample basis.