Once every pixel can be disguised, It's becoming increasingly difficult to trust what we see on screen. Videos generated and manipulated using artificial intelligence (AI) are raising new questions about authenticity and creative control.
That's why standards are important. Nokia is leading the way with the Versatile Supplemental Enhancement Information (VSEI) standard, designed to help everyone from content creators to viewers validate, protect and enhance video in the age of AI. The VSEI standard complements video coding standards such as H.264 (Advanced Video Coding, AVC), H.265 (High Efficiency Video Coding, HEVC), and H.266 (Versatile Video Coding, VVC). Version 4 of the VSEI standard was recently completed and addresses new use cases in AI, machine vision, and preservation of creative intent. This blog summarizes the new features and enhancements provided by the new VSEI version.
Versatile Additional Extension Information (VSEI)
The VSEI standard plays an important role in powering video coding standards such as H.264, H.265, and H.266. This blog refers to them collectively as H.26x. Although the core decoding algorithms of the H.26x standard have remained unchanged for years, the continuously developed VSEI standard allows H.26x codec implementations to continue to evolve to better address specific use cases.
VSEI version 4 is a major update that provides a variety of new features as well as enhancements to functionality specified in previous versions of the standard. The VSEI version 4 standardization was a two-year collaboration between more than 10 companies, with Nokia being one of the most active contributors.
The VSEI standard specifies Supplemental Enhancement Information (SEI) messages that can be included in an encoded video bitstream. This additional information helps your device better understand and process your video. The metadata included in SEI messages is synchronized with the encoded video and helps improve image quality and provide more details about the video itself. Thanks to the VSEI standard, decoders on different devices and applications can read and use this information in the same way, making the video experience more reliable and consistent.
Manage video content interaction with AI
AI-based video generation and manipulation is becoming increasingly accessible and sophisticated, making it more difficult to detect AI-generated or manipulated content. Therefore, it has become essential to verify the authenticity of video content.
VSEI version 4 allows video creators to digital signature Create coded videos to prove that the content is authentic and has not been modified since its creation. For example, news agencies can use digital signatures to add special marks to videos. This allows viewers to verify that the video truly comes from that news agency and has not been altered.
New regulations require clear labeling called . AI markingif the content is created or modified by AI. This is particularly important, for example, when generative AI is used to change the appearance of public figures, such as politicians during election campaigns. VSEI version 4 allows you to add these AI marking labels to your videos so viewers know when AI has been involved.
Additionally, VSEI version 4 allows content owners to configure Restrictions on the use of AI, These are the rules for how AI uses video. For example, you can choose not to use videos to train your AI models, protecting your privacy and protecting the rights of content owners.
Generative AI for video enhancement and compression
Previous versions of the VSEI standard introduced support for: Neural network post-filtering (NNPF) This marks perhaps the first time that AI has been integrated into a video standard. Since then, Nokia has researched various aspects of NNPF technology. To hide common problems in videos called artifactssuch as contours (uneven color areas) and block noise (visible squares) due to video coding with limited bitrate. NNPF gives content creators control over the post-processing of their videos, ensuring that their creative intent remains intact.
Currently, in VSEI version 4, NNPF is now even smarter with the addition of generative AI capabilities. For example, you can add text prompts to your videos to guide generation filtering. In addition to traditional filtering purposes, such as making a video look sharper, generative NNPF can be used to spatially extend an image or create future images.
Generate facial video coding This is another new feature. This allows video of human faces to be coded at bitrates as low as a few kilobits per second. The technology works by coding one main or base image and additional details, and the AI uses those inputs to create the rest of the video. The VSEI standard includes signaling that tells the decoder which neural network model and facial parameters to use to play the video correctly.
Creator-driven post-processing
VSEI version 4 allows video creators to Priority order of post-processing operationsconverting colors, adding film grain, rotating images for display, and more. You can also set up different processing chains for different display resolutions. In addition, support for film grain, which has been present in the H.26x codec for decades, has been enhanced to allow for different signaling. Film grain model according to display resolution. This means your video will look its best whether you view it on your phone or on a big screen.
These additions to VSEI version 4 give content creators greater control over how their videos appear on receiving devices, allowing them to maintain their creative intent.
video for computer vision
Video is increasingly consumed by machine analysis tasks rather than being viewed by humans. It is reported that Machine-to-machine video amounts to tens of zettabytes per year. Therefore, it is becoming more important to optimize video compression without compromising the accuracy of machine tasks. VSEI version 4 adds several new features to enhance machine-to-machine video.
Videos optimized for computer vision may not provide the best viewing experience for humans. Safeguards against machine-targeted video viewing It is embedded within the signaling that describes the behavior of the encoder, the post-processing chain, and the post-filtering of the neural network. More broadly speaking, Types of encoder optimization Details can be found in the Encoder Optimization Information (EOI) SEI message. This allows the receiving system to appropriately coordinate post-processing and analysis tasks.
Many machine tasks, such as human identification, work best when important parts, called regions of interest (ROIs), are displayed in the best possible quality, and the background is less important. Video encoding systems can use ROI detection and preprocessing or encoding optimization to ensure that these critical regions are presented at their best, even if the quality and bitrate of the remaining regions is degraded. For example, the encoder may use a finer quantization step size for the ROI, which can be accounted for in the EOI SEI message. Alternatively, the encoder is Pack the foreground region at a higher spatial resolution and the background region at a lower resolution. It is converted into a source picture used for encoding. This can be explained as follows. Packed region information (PRI) SEI message Therefore, the receiving system knows how to restore the original location of the area.
By using semantic segmentation or instance segmentation to split your video into different objects, you can display each object in its own solid color within an object mask picture. In VSEI version 4, Object mask explanation Possible, Therefore, these masks can be included in the same coded video clip as the original source video. This feature makes H.26x the best output format for segmented video.
image metadata extension
Video may be recorded at a different speed than shown. For example, you can capture video at a high image rate such as 240 Hz and play it in slow motion, or vice versa. of Source picture timing Information SEI messages include metadata about when photos were taken, which helps you track when each photo was taken.
Some image sensors can capture wavelengths beyond visible light. Modality Information SEI messages indicate that the images in the video visible, infrared, or ultraviolet light, You can also include details about the exact wavelength.
Just as digital photos can store additional details (metadata), VSEI version 4 allows videos to include the following information: Image format metadataso important information about when and how the video was created may be sent along with the file.
Nokia advances new use cases for coded video
As AI continues to reshape how videos are created and experienced, the need for trust and authenticity has never been greater. VSEI version 4 sets new benchmarks for transparency, creative control, and intelligent machine vision. These latest enhancements enable content creators, device manufacturers, and viewers to verify, protect, and enhance their videos, ensuring that innovation and trust work together in the digital world.
New and enhanced SEI messages specified in VSEI version 4 enable the H.26x coding standard to better support the most important emerging use cases for coded video, such as AI and machine vision, while protecting creative intent.
Nokia is proud to have led the development of the VSEI standard. Our team has provided key technology to this standard and played a key editorial role in shaping its direction. Today, we continue to pioneer secure, intelligent, and moving video experiences for the digital world.
