Fundamentals of AI-based codec development

AI Video & Visuals


While many companies are leveraging AI to enhance the performance of existing codecs such as H.264, HEVC, and AV1, UK-based Deep Render is developing a codec that is completely AI-based. To promote understanding of AI codec development, his CTO and co-founder of Deep Render, Arsalan Zafar, has launched an educational resource titled “Fundamentals of AI-based Codec Development: An Introductory Course.” In a recent interview with Streaming Media Magazine, Zafar provided insight into the course content, target audience, and expected outcomes.

Zafar began by explaining that Deep Render's mission is to “pioneering the next generation of image and video codecs powered by machine learning.” He then shared that Deep Render has developed his AI codec, the world's first, in the five years since its inception, delivering up to 80% bandwidth savings over H.264.

Why did Zafar create this course? In his words: “Have you ever wondered how AI-based codecs work?” Deep Render is dedicated to this pursuit. , Zafar wants to demystify this for the broader codec community. To achieve this, in this course he will explore the historical evolution of AI and machine learning, highlighting key milestones and recent advances.

This course begins with basic machine learning principles (architecture, loss functions, and training) that are important for understanding the concepts underlying AI-based codecs. Next, we explore the architecture of the AI-based codec and show how neural network layers and machine learning primitives are configured to optimize compression efficiency. Formulation of an objective function that defines the rate-distortion tradeoff is a key aspect of AI-based compression. Zafar also describes how rate and distortion are defined in a differentiable manner, allowing efficient optimization through backpropagation.

AI-Based Codec Course Objectives

Zafar then guides learners through the encoding and decoding process, highlighting the transition from traditional methods to machine learning-driven approaches. This hands-on learning approach ensures that learners apply the knowledge gained from the course to real-world scenarios.

The course then provides an overview of the production side of AI-based codecs and details the transition from training to inference regimes and the use of neural processing units for efficient execution. We then explore the unique features and benefits of AI-based codecs, including adaptability to domain-specific content and scalability as hardware advances. Zafar highlights the potential for rapid innovation and updates in his ecosystem of AI-based codecs, facilitating faster deployment and adaptation to evolving industry needs.

In addition to theoretical discussions, this course provides practical demonstrations and examples of AI-based compression, demonstrating both successful results and potential challenges. Arsalan encourages learners to engage with Deep Render demos and resources to further explore the capabilities of the AI-based codec.

This course is primarily designed for a highly technical audience, such as video codec engineers looking to integrate AI-based codecs into their pipelines, but who are simply considering applying AI to video codec development. It will also be useful for other engineers who want to. Zafar describes this as an introduction to AI-based compression, exploring detailed algorithms while addressing real-world production considerations such as the playback platform required for distribution. Despite his concise 16-minute duration, this course provides a comprehensive overview of AI video codec development, making it a valuable resource for anyone interested in this field.

Free resources are available on YouTube (https://bit.ly/AI_codec_course).

streaming cover

Related article


Round the Horn at NAB 2024: Videon, Telestream, Phenix, Ateme, V-Nova, Twelve Labs, Norsk, Dolby, NETINT

Every NAB report is like the story of the blind man and the elephant. That is, what you experience is what you are exposed to and represents a part of the whole, perhaps not even a good sample. So, below is a summary of what was mentioned during the program. Many of these experiences are accompanied by videos of interviews I filmed.




iSize claims significant performance savings with debut AI codec

BitSave promises 70% bitrate savings through machine learning and currently works with H.264, with H.264, H.265, and VP9 also in development.








Source link

Leave a Reply

Your email address will not be published. Required fields are marked *