From reality to fantasy: Live2Diff AI instantly stylizes your videos

To receive industry-leading AI updates and exclusive content, sign up for our daily and weekly newsletters. Learn more

An international team of researchers has developed an AI system that can transform live video streams into stylized content in near real time. The new technology, called Live2Diff, processes live video at 16 frames per second on high-performance consumer hardware, and has the potential to reshape applications from entertainment to augmented reality experiences.

Created by scientists from the Shanghai AI Lab, the Max Planck Institute for Informatics, and Nanyang Technological University, Live2Diff is the first successful implementation of unidirectional attention modeling in a video diffusion model for live stream processing.

Live2Diff is the first attempt to enable one-way attention modeling into a video diffusion model for live video stream processing.
Achieving 16FPS with an RTX 4090 GPU?
Link ⬇️ pic.twitter.com/L2HP4QOK8j
— Dreaming Tulpa?? (@dreamingtulpa) July 17, 2024

“We present Live2Diff, the first attempt to design a video diffusion model with one-way temporal attention specifically targeted at live streaming video translation,” the researchers explain in their paper published on arXiv.

This new approach overcomes a major hurdle in video AI: current state-of-the-art models rely on two-way temporal attention, which requires access to future frames and makes real-time processing impossible. Live2Diff's one-way method maintains temporal consistency by correlating each frame with its previous frame and some initial warm-up frames, eliminating the need for future frame data.

Real-time Video Style Transfer: The New Frontier of Digital Content Creation

Dr. Kai Chen, corresponding author of the project from Shanghai AI Lab, explains in the paper: “Our approach ensures temporal consistency and smoothness without future frames, which opens up new possibilities for live video translation and processing.”

The team demonstrated the capabilities of Live2Diff by converting a human face captured by a live webcam into a cartoon-like character in real time. Extensive experiments showed that the system outperforms existing methods in temporal smoothness and efficiency, which was confirmed by both quantitative metrics and user studies.

Schematic of Live2Diff's innovative approach: (a) the training stage incorporates depth estimation and a novel attention mask, and (b) the streaming inference stage employs multi-timestep caching for real-time video processing. This technology represents a major advancement in AI-powered live video translation. (Credit: live2diff.github.io)

The impact of Live2Diff is far-reaching and multifaceted. In the entertainment industry, the technology has the potential to redefine live streaming and virtual events. Imagine a concert where performers instantly transform into animated characters, or a live sports broadcast where athletes transform into superheroes in real time. For content creators and influencers, it will be a new tool for creative expression, allowing them to present a unique, stylized version of themselves during live streams and video calls.

In the field of Augmented Reality (AR) and Virtual Reality (VR), Live2Diff can enhance immersion: by enabling real-time style transfer on live video feeds, the gap between the real world and virtual environments can be bridged more seamlessly than ever before. This can be applied in gaming, virtual tourism and even in professional fields such as architecture and design, where real-time visualization of stylized environments can aid in the decision-making process.

But like any powerful AI tool, Live2Diff raises important ethical and societal questions. The ability to alter live video streams in real time could be misused to create misleading content and deepfakes. It could also blur the line between reality and fiction in digital media, requiring new forms of media literacy. As this technology matures, it will be important for developers, policymakers, and ethicists to work together to establish guidelines for responsible use and implementation.

Video The Future of AI: Open Source Innovation and Industry Applications

While the full code for Live2Diff is pending release (due next week), the research team has published a paper and plans to open-source their implementation soon, a move they hope will spur further innovation in real-time video AI.

Live2Diff represents a breakthrough as artificial intelligence continues to advance in the field of media processing. The technology's ability to process live video streams at interactive speeds could have immediate applications in live event broadcasting and next-generation video conferencing systems, pushing the boundaries of real-time AI-driven video manipulation.

VB Daily

Stay up to date! Get the latest news every day by email

By subscribing, you agree to VentureBeat's Terms of Use.

Thanks for subscribing! Check out other VB newsletters here.

An error occurred.

Source link