Researchers identify 'fingerprints' in AI-generated videos thanks to deepfake traces

Credit: AI generated image

In February, OpenAI released a video created by its generative artificial intelligence program Sora. Stunningly realistic content generated by simple text prompts is the latest breakthrough for companies demonstrating the power of AI technology. It also raised concerns about the potential for generative AI to create misleading or deceptive content at scale.

Current methods for detecting altered digital media are ineffective against videos generated by AI, according to a new study from Drexel University. However, machine learning approaches may hold the key to uncovering the true nature of these synthetic creations.

In a paper accepted for presentation at the IEEE Computer Vision and Pattern Recognition Conference in June, researchers from Drexel University's School of Engineering's Multimedia and Information Security Laboratory argue that existing synthetic image detection techniques have so far I explained that it was failing to detect the video. has had success with machine learning algorithms that can be trained to extract and recognize the digital “fingerprints” of various video generators, including Stable Video Diffusion, Video-Crafter, and Cog-Video.

We also showed that the algorithm can learn how to detect new AI generators after learning just a few examples of videos.

“It's a little worrying that this video technology could be released before we have a good system in place to detect fakes created by malicious parties,” said Drexel University College of Engineering Associate Professor. Director Dr. Matthew Stam said. MISL.

“Responsible companies will do their best to embed identifiers and watermarks, but once this technology is released to the public, people who want to use it for deception will find a way to do so. We strive to stay ahead of the curve by developing technologies that: “identify synthetic videos based on media-specific patterns and characteristics. ”

deepfake detective

For more than a decade, Stam's lab has been active in efforts to issue warnings about digitally altered images and videos, which have seen the editing techniques used to spread political misinformation. , last year was especially busy.

Until recently, these operations were the product of photo and video editing programs that added, removed, or shifted pixels. Or you can slow down, speed up, or crop video frames. Each of these edits leaves a unique digital breadcrumb trail, so Stamm's lab has developed a suite of tools tailored to find and track them.

The institute's tools use advanced machine learning programs called constrained neural networks. Rather than searching for specific predetermined operational identifiers from the start, the algorithm works in a similar way to the human brain to determine what is “normal” and what is “abnormal” at a sub-pixel level in an image or video. You can learn. This makes the program adept at both identifying deepfakes from known sources and identifying deepfakes created by previously unknown programs.

Neural networks are typically trained on hundreds or thousands of examples to get a very good understanding of the differences between unedited and manipulated media. This can include anything from changes between adjacent pixels to the order of the spacing of frames within a frame. Affects the size and compression of the video, the file itself.

new challenge

“When you create an image, the physical and algorithmic processing of the camera introduces relationships between different pixel values that are very different from the pixel values that would occur if you generated the image in Photoshop or AI,” Stamm said. says.

“But recently we've seen text-to-video generators like Sora that can create very impressive videos. And because they aren't created with a camera or Photoshop, they pose a whole new set of challenges. ”

A campaign ad circulating last year in support of Florida Governor Ron DeSantis depicted former President Donald Trump hugging and kissing Anthony Fauci, the first person to use generative AI technology. This means the video was created entirely by an AI program, rather than being edited or spliced together from other videos.

Stam also points out that when no edits are made, there are no standard clues, creating unique detection problems.

“Until now, forensic detection programs have been effective simply by treating the edited video as a series of images and applying the same detection process,” Stamm said.

“However, in the case of AI-generated video, there is no evidence that the image was manipulated frame by frame, so for a detection program to be effective, the way the generative AI program constructs that image requires that the image be left behind. We need to be able to identify new traces.”

In the study, the team tested 11 publicly available synthetic image detectors. Both of these programs were very effective at identifying manipulated images, with at least 90% accuracy. However, when faced with picky videos created by public AI generators Luma, VideoCrafter-v1, CogVideo, and Stable Diffusion Video, performance dropped by 20-30%.

“These results clearly demonstrate that synthetic image detectors have considerable difficulty detecting synthetic videos,” the researchers wrote. “This finding is consistent across multiple different detector architectures and when the detector is pre-trained by others and re-trained using our dataset.”

reliable approach

The researchers speculated that similar to the MISLnet algorithm, convolutional neural network-based detectors could be successful on synthetic videos. This is because the program is designed to constantly change its learning as it encounters new examples. This makes it possible to recognize new and evolving forensic traces. Over the past few years, the team has demonstrated his MISLnet acuity for spotting manipulated images using new editing programs, including AI tools. So it was a natural step to test it against synthetic videos.

“We used CNN algorithms to detect deepfakes in manipulated images, videos, and audio with reliable success,” said Tai, a PhD student at MISL and co-author of the paper. D. Nguyen said. “Due to its ability to adapt to small amounts of new information, we thought it could also be an effective solution for identifying AI-generated synthetic videos.”

For this test, the group trained eight CNN detectors, including MISLnet, using the same test dataset used to train the image detectors. This includes real videos and AI-generated videos generated by four publicly available programs. We then tested the program against a set of videos, including videos created by generative AI programs (Sora, Pika, and VideoCrafter-v2) that are not yet publicly available.

By analyzing small portions (patches) of single frames in each video, a CNN detector can learn what a synthetic video looks like at a detailed level and apply that knowledge to a new set of videos. It's done. Each program was more than 93% effective in identifying synthetic videos, with MISLnet showing the best performance with his 98.3%.

The program is slightly more effective at performing whole-video analysis by randomly extracting a few dozen patches from different frames of a video and using them as mini-training sets to learn new video features. was. Using a set of 80 patches, the accuracy of the program was between 95 and 98%.

With a little extra training, the program was able to identify the program used to create the video with over 90% accuracy. The team suggests this is due to the unique and unique approach each program uses to create videos.

“Videos are generated using a variety of strategies and generator architectures,” the researchers wrote. “Each technique provides a significant trace, making it much easier for the network to accurately distinguish between each generator.”

easy study

When the program faced the challenge of discovering a completely new generator without first publishing at least a small amount of video from that generator, it struggled, but with a small amount of tweaking, MISLnet It learned quickly and was able to identify with 98% accuracy. This strategy, called “few-shot learning,” is an important feature. New AI technologies are created every day, so detection programs need to be agile enough to adapt with minimal training.

“We're already seeing AI-generated videos being used to generate misinformation,” Stamm said. “As these programs become more popular and easier to use, we can reasonably expect to see a flood of synthetic videos. Detection programs should not be the only line of defense against misinformation, but information literacy efforts The key is to have the technical capacity to verify the credibility of digital media is certainly an important step. ”

For more information:
Paper: Beyond synthetic images: Detecting synthetic videos generated by AI

Source link