Cisco Lays Foundation for AI Network Infrastructure

AI News

Credit: Cisco

Cisco is developing a new high-end programmable Silicon One processor aimed at powering large-scale artificial intelligence (AI)/machine learning (ML) infrastructure for enterprises and hyperscalers.

The company has added the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202 to its currently 13-member Silicon One family. They can be customized for routing or switching from a single chipset, eliminating the need to use different silicon architectures for each network function. This is accomplished through a common operating system, P4 programmable transfer code, and SDK.

According to Rakesh Chopra, Cisco Fellow of the vendor’s Common Hardware Group, this new top-of-the-line device in the Silicon One family is the ideal network for demanding AI/ML deployments and other highly distributed applications. Provides enhanced functionality.

“We are going through this massive shift in the industry, where what seemed massive at the time was nothing compared to the absolutely massive adoption required for AI/ML, but this We were building a reasonably small, high-performance computing cluster for the species,” Chopra said. AI/ML models went from requiring a few GPUs to requiring tens of thousands of GPUs linked in parallel and serially. “The number of GPUs and scale of the network is unprecedented.”

New Silcon One enhancements include the P4 programmable parallel packet processor capable of over 435 billion lookups per second.

“We have a fully shared packet buffer, and every port has full access to the packet buffer, regardless of what is going on,” Chopra said. This is in contrast to assigning buffers to separate input and output ports. In other words, the buffer you get depends on the port the packet is sent on. “That means less ability to write due to traffic bursts, more likely to drop packets, and significantly lower AI/ML performance,” he said.

Additionally, each silicon device can support 512 Ethernet ports, allowing customers to build 32K 400G GPU AI/ML clusters with 40% fewer switches than other silicon devices required to support that cluster. Chopra said.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *