Video is taking the internet by storm, with nearly 80% of traffic being video. Data centers over the past few years have increasingly relied on GPU accelerators to transcode the massive amounts of video traffic running over their networks, in hopes of reducing latency, cost, and power consumption, in favor of CPUs. I offloaded a lot of work from.
As the nature of video changes, it becomes more and more difficult. A common model to date has been a one-to-many on-demand environment driven by companies like Netflix and events like live sports competitions. Video feeds start in one place and run through cloud data centers, content delivery. Networks (CDN) and edge servers before landing in corporate offices and consumer homes.
There’s always a little bit of delay either given the amount of processing and computing that needs to be done in the data center to ensure good quality, or because broadcasters want a few seconds delay for editing purposes. Such delays are not a big deal in such scenarios.
But video is becoming more and more interactive, not just for consumer applications like the Twitch video game live-streaming service, but also for the expanded telecommuter, largely due to the COVID-19 pandemic. Enterprise tools such as video conferencing are in use. In December 2019, Zoom had 10 million daily participants. By June 2020, when the pandemic engulfed the world, that number had reached 300 million. Other services such as Microsoft’s Teams and Cisco Systems’ Webex have seen similar growth.
This interactive video environment puts even more pressure on data center resources to reduce latency and eliminate latency. By 2021, 70% of the video market will be interactive video.
Vincent Fung, senior product marketing manager at AMD, said: next platform. [infrastructure] Models don’t make much economic sense. Maintaining a model that accommodates these use cases becomes difficult. ”
These use cases were in mind for AMD CEO Lisa Su and other executives when AMD acquired programmable chip maker Xilinx for $35 billion early last year. AMD has made a remarkable return to the data center over the past few years, helped by Intel’s various stumbles through its Zen microarchitecture, Epyc server CPUs, and Radeon GPUs, capturing over 25% of the data center CPU market, I have seen growth. GPU space.
The acquisition of Xilinx gives AMD an even bigger presence in the data center, not only through field programmable gate arrays (FPGAs), but also through software for areas such as AI engines, adaptive systems on chips (SoCs) and networking. I was. and edge. Xilinx is also the foundation of the company’s Adaptive and Embedded Computing Group, offering a range of dedicated video encoding cards.
This included the Alveo U30 media accelerator, introduced by Xilinx in 2020 and targeted for live streaming workloads. It is used for live video transcoding over the cloud via Amazon Web Services EC2 VGT1 instances or on-premises in pre-configured appliances. Fung said AMD “anticipated the growth of interactive media, so we prepared the first generation of his U30.” Currently, the company offers samples of his Alveo MA35D, the successor to the card. This is a data center media accelerator and dedicated video encoding card that is a significant improvement over the U30.
With the move to more live video streaming, “traffic increased significantly,” says Fung. “When you look at interactive use cases where one-to-many becomes many-to-many, there is a lot more that needs to be done in terms of video. Eliminates the compromises needed High performance is required because so many people use Bandwidth cost must be minimized due to large ingest Power consumption is all part of the cost will be.”
Like the Alveo U30, the MA35D is made for real-time interactive video encoding, but for the first time from post-Xilinx AMD. It includes two 5nm ASIC Video Processing Units (VPUs) to deliver four times the number of simultaneous video streams (up to 32 1080p60 channels) and includes support for 8K and AV1 resolution encoding .
According to Sean Gardner, AMD’s head of video strategy and development, it is used by many big names such as Meta, Microsoft and Cisco, as well as services such as Google’s YouTube, Netflix and Roku.
“It’s there, but it’s limited,” says Gardner. next platform“The theoretical goal of every new standard is to achieve 50% compression efficiency over the previous standard. If I lock to visual quality, how many bits does it take to achieve that quality?” Each standard strives to achieve that quality while using 50% less bandwidth, but each step tends to be spent on the encoding side, either because the amount is higher, or because it’s starting to change before. , which makes decoding cheaper, but at a penalty of about 5-7x for each new codec implementation.”
Latency matters, he says.
“Netflix has no latency [issue]says Gardner. “They say it can take 10 hours to process an hour of video, and they are actually processing it, but they can use it outside of capacity. If you don’t, you’re falling behind the real-time of 60 frames per second.When you think about this scenario where you can use Zoom, Teams, or Webex, the potential for billions of people using it at the same time Or someone like Twitch with hundreds of thousands of ingested streams. [streaming], you can’t use caching CDN-like architectures because they can’t tolerate delays. This is why you need acceleration. ”
In addition to 4x the channel density, the MA35D, which enters production in Q3 and has an MSRP of $1,595, is tested at 2x the cost per channel, 1.8x the compression, and 4x less is indicated by Latency. It also scales from 32 streams with cards to 256 streams in server format with 8 cards. Then extend to the rack or data center level. Save bandwidth with up to 52% bitrate reduction.
In addition to the VPU, accelerators include encoders and decoders, an adaptive bitrate scaler, a compositor engine for immersive computing, a visual quality engine, and a “lookahead” engine to analyze motion and content for efficient compression. ” is also included. AI processor that optimizes visual quality.
Host CPU communication is via the Gen4 backward compatible PCI-Express 5.0 bus.
“The accelerator is the entire video pipeline,” says Fung. “The goal here is to not have to go out and put these tasks off the chip so we can maintain consistent performance levels. It’s not subject to close use cases.It’s all here and solid.There’s an AI block here, there’s typical encoding, decoding, but there’s also base optimization.”
In the video world, AMD is moving away from Nvidia’s GPU strategy. Its T4 Tensor cores are primarily intended for his AI inference, L4 is for graphics, and Intel and its GPU Flex series are for data center media streaming. Gardner says that when the amount of streaming video started to increase, the only real game in town was his GPU from Nvidia.
We all know that the two main applications for this kind of accelerator today are video and AI. Video is huge now, but AI is on the rise. Strategies are designed for these two use cases.
“Things started to open up,” he says. “Intel and Nvidia continue to push it from the GPU (or what Intel is trying to do) with big AI and small video. Intel is working on it with some kind of medium video, medium AI. We’ve been working on it since 99% of the video and we’ve added a few little AIs, but we’re not trying to get into smart cities or surveillance, the AI specifically does inline pixel-level processing. It’s targeted.”