New tool detects AI-generated videos with 93.7% accuracy

Turns out I'm not real after all: AI-generated video detection — Pictures: First column: Video frames taken from YouTube and fake videos generated by OpenAI from Sora. Second column: Frames reconstructed by diffusion. Third column: Difference between first and second columns. As the figures show, real-world video frames differ more from diffusion-reconstructed frames than from diffusion-generated videos. This is a key insight for DIVID to detect diffusion-generated videos. DIRE (DIffusion Reconstruction Error) is a method to measure the difference between an input image and the corresponding output image reconstructed by a pre-trained diffusion model. Credit: Software Systems Laboratory/Columbia Engineering

Earlier this year, an employee of a multinational company transferred $25 million to fraudsters, believing the instructions to wire the money came directly from the company's CFO. In reality, however, the criminals had used an AI program to generate realistic videos of the CFO and several other colleagues in an elaborate scheme.

AI-created videos have become so realistic that humans (and existing detection systems) struggle to distinguish real from fake videos. To address this problem, researchers at Columbia Engineering, led by computer science professor Junfeng Yang, have developed a new tool for detecting AI-generated videos: DIVID (short for DIffusion-generated VIdeo Detector). DIVID extends the work of Raidar, which the team released earlier this year, to detect AI-generated text by analyzing the text itself, without access to the inner workings of large language models.

A paper on new tools arXiv Preprint server.

DIVID detects a new generation of generative AI videos

DIVID improves on existing methods for detecting generative video, effectively identifying videos generated by older AI models such as generative adversarial networks (GANs). A GAN is an AI system with two neural networks: one creates fake data and the other evaluates it to distinguish fake from real. Through continuous feedback, both networks improve, resulting in highly realistic synthetic videos. Current AI detection tools typically look for signs such as unusual pixel placements, unnatural movement, and mismatches between frames that don't occur in real videos.

A new generation of generative AI video tools, such as OpenAI's Sora, Runway Gen-2, and Pika, create videos using diffusion models, an AI technique that creates images and videos by gradually transforming random noise into clear, realistic images. For videos, it adjusts each frame individually while ensuring smooth transitions to produce high-quality, realistic results. As AI-generated videos become more sophisticated, their authenticity becomes increasingly difficult to detect.

To detect diffusion-generated images, Yang’s group used a technique called DIRE (DIffusion Reconstruction Error), a method that measures the difference between an input image and the corresponding output image reconstructed by a pre-trained diffusion model.

Extending Raidar's AI-generated text to video

Yang, co-director of the Software Systems Lab, has been researching ways to detect AI-generated text and video. Earlier this year, Yang and his collaborators released Raidar, a way to detect AI-generated text by analyzing the text itself, without access to the inner workings of large language models like chatGPT-4, Gemini, or Llama. Raidar uses language models to rephrase or modify a given piece of text, measuring how many edits the system made to a given piece of text. More edits mean the text is more likely to have been written by a human, while fewer changes mean the text is more likely to have been generated by a machine.

“Raidar's insights — that output from AI is often judged to be of high quality by other AIs and therefore less edited — are very powerful, and go beyond just text,” Yang says. “AI-generated videos are becoming more and more realistic, so we wanted to take Raidar's insights and build a tool that can accurately detect AI-generated videos.”

The researchers used the same concept to develop DIVID, a new generative video detection method that can identify videos generated by a diffusion model. The research paper, which includes open-source code and a dataset, was presented at the Computer Vision and Pattern Recognition Conference (CVPR) in Seattle on June 18, 2024.

How DIVID works

DIVID works by reconstructing a video and then analyzing the newly reconstructed video against the original video. The method works on the hypothesis that reconstructed images generated by a diffusion model should closely resemble each other since they are sampled from the diffusion process distribution, so it uses the DIRE value to detect diffusion-generated videos. If there are significant changes, the original video is likely human-generated; if not, it is likely AI-generated.

The framework is based on the idea that AI generative tools create content based on statistical distributions of large data sets, resulting in content that is more “statistically averaged,” such as pixel intensity distributions within video frames, texture patterns, noise characteristics, subtle inconsistencies and artifacts that change unnaturally between frames, and unusual patterns that are more likely in diffusely generated video than in real video.

In contrast, human-generated videos have individuality and deviate from the statistical norm. DIVID has achieved groundbreaking detection accuracy of up to 93.7% on videos from the Stable Vision Diffusion, Sora, Pika, and Gen-2 diffusion-generated video benchmark datasets.

For now, DIVID is a command-line tool that analyzes videos and outputs whether they were made by an AI or a human, and is only available to developers. The researchers note that the technology could be integrated as a plugin into Zoom to detect deepfake calls in real time. The team is also considering developing a website or browser plugin to make DIVID available to the public.

“Our framework is a major step forward in detecting AI-generated content,” said Yun-Yun Tsai, one of the paper's authors and a doctoral student under Yang. “There are so many fraudsters using AI-generated videos, it's important to stop them and protect society.”

What's next?

The researchers are currently working on improving the DIVID framework so that it can handle different kinds of synthetic videos from open-source video generation tools, and they are also using DIVID to collect videos for the DIVID dataset.

For more information:
Qingyuan Liu et al., “Turns out I'm not real: Towards robust detection in AI-generated videos.” arXiv (2024). DOI: 10.48550/arxiv.2406.09601

Journal Information:
arXiv

Courtesy of Columbia University School of Engineering and Applied Sciences

Quote: New Tool Detects AI-Generated Videos with 93.7% Accuracy (June 26, 2024) Retrieved June 26, 2024 from https://techxplore.com/news/2024-06-tool-ai-generated-videos-accuracy.html

This document is subject to copyright. It may not be reproduced without written permission, except for fair dealing for the purposes of personal study or research. The content is provided for informational purposes only.

Source link