Deepfake traces allow Drexel researchers to identify 'fingerprints' in AI-generated videos

Machine learning detection of AI-generated videos

A study from Drexel University's College of Engineering suggests that current techniques for detecting digitally manipulated images are not effective at identifying new videos created by generative AI techniques. These video frames (top) produce different forensic signatures (bottom) than what current detectors are tuned to detect.

In February, OpenAI released a video created by its generative artificial intelligence program Sora.of Amazingly realistic contentis the latest breakthrough for businesses to demonstrate the power of AI technology, created by a simple text prompt.Also caused concern About the potential of generative AI to create misleading and deceptive content at scale. Current methods for detecting altered digital media are ineffective against videos generated by AI, according to a new study from Drexel University. However, machine learning approaches may hold the key to uncovering the true nature of these synthetic creations.

Among the papers accepted for presentation at the IEEE Computer Vision and Pattern Recognition Conference In June, researchers Multimedia and Information Security Laboratory from Drexel University's School of Engineering explained that while existing synthetic image detection techniques have so far failed to detect AI-generated videos, they have succeeded in: A machine learning algorithm that can be trained to extract and recognize the digital “fingerprint” of various video generators, including Stable Video Diffusion, Video-Crafter, and Cog-Video. We also showed that the algorithm can learn how to detect new AI generators after learning just a few examples of videos.

“It is of no small concern that this video technology could be released before we have good systems in place to detect fakes created by bad actors,” he said. . Dr. Matthew StamAssociate Professor at Drexel University Faculty of Engineering and director of MISL. “Responsible companies will do their best to embed identifiers and watermarks, but once this technology is released to the public, people who want to use it for deception will find a way to do so. We are working to stay ahead of the media by developing technology that identifies synthetic videos based on their patterns and features.”

deepfake detective

Stamm's lab has been active in efforts to flag digitally manipulated images and videos. over 10 yearsbut the group has been particularly busy last year as an editor. Technology is being used to spread political misinformation.

Until recently, these operations were the product of photo and video editing programs that added, removed, or shifted pixels. Or you can slow down, speed up, or crop video frames.Each of these edits leaves a unique digital imprint of his breadcrumbs, and Stamm's laboratory developed a series of tools Tailored to help you find and track them.

The lab's tools use an advanced machine learning program called . constrained neural network. In a similar way to the human brain, this algorithm can learn: What is “normal” and what is “abnormal” at the sub-pixel level in images and videos?Rather than searching for a specific predetermined identifier for the operation from the beginning. This makes the program adept at both identifying deepfakes from known sources and identifying deepfakes created by previously unknown programs.

Neural networks are typically trained on hundreds or thousands of examples to very well understand the difference between unedited media and manipulated ones. Change between adjacent pixels,fart Frame spacing order In the video Size and compression of the file itself.

new challenge

“When you create an image, the physical and algorithmic processing of the camera introduces relationships between different pixel values that are very different from the pixel values that would occur if you generated the image in Photoshop or AI,” Stamm said. says. “But these days, we're seeing text-to-video generators like Sora that can create really impressive videos. These aren't created with a camera or Photoshop, so they bring a whole new set of challenges. ”

last year campaign advertisement An article that went viral in support of Florida Governor Ron DeSantis appeared to show former President Donald Trump hugging and kissing Anthony Fauci. Fauci was the first to use generative AI technology. This means the video was created entirely by an AI program, rather than being edited or spliced together from other videos.

And without editing, Stam points out, there are no standard clues, creating unique detection problems.

“Until now, forensic detection programs have simply treated the edited video as a series of images and applied the same detection process,” Stamm said. “However, in the case of AI-generated video, there is no evidence that the images have been manipulated frame by frame, so for a detection program to be effective, new You need to be able to identify the traces.”

In the study, the team tested 11 publicly available synthetic image detectors. Both of these programs were very effective at identifying manipulated images, with at least 90% accuracy. However, when faced with picky videos created by public AI generators Luma, VideoCrafter-v1, CogVideo, and Stable Diffusion Video, performance dropped by 20-30%.

“These results clearly demonstrate that synthetic image detectors have considerable difficulty detecting synthetic videos,” the researchers wrote. “This finding holds consistently across multiple different detector architectures and even when the detector is pre-trained by others and re-trained using our dataset.”

reliable approach

The researchers speculated that similar to the MISLnet algorithm, convolutional neural network-based detectors could be successful on synthetic videos. This is because the program is designed to constantly change its learning as it encounters new examples. This makes it possible to recognize new and evolving forensic traces. Over the past few years, the team has been demonstrating his MISLnet. Sharpness in detecting manipulated images Using a new editing program, Including AI tools — So it was a natural step to test it against synthetic video.

“We used the CNN algorithm to reliably detect deepfakes in manipulated images, videos, and audio, and were reliably successful,” said Tai, a PhD student at MISL and co-author of the paper. D. Nguyen said. “Due to its ability to adapt to small amounts of new information, we thought it could also be an effective solution for identifying AI-generated synthetic videos.”

For this test, the group trained eight CNN detectors, including MISLnet, using the same test dataset used to train the image detectors. This includes real videos and AI-generated videos generated by four publicly available programs. We then tested the program against a set of videos, including videos created by generative AI programs (Sora, Pika, and VideoCrafter-v2) that are not yet publicly available.

By analyzing small portions (patches) of single frames in each video, a CNN detector can learn what a synthetic video looks like at a detailed level and apply that knowledge to a new set of videos. It's done. Each program was more than 93% effective in identifying synthetic videos, with MISLnet showing the best performance with his 98.3%.

The program is slightly more effective at performing whole-video analysis by randomly extracting a few dozen patches from different frames of a video and using them as mini-training sets to learn new video features. was. Using a set of 80 patches, the accuracy of the program was between 95 and 98%.

With a little extra training, the program was able to identify the program used to create the video with over 90% accuracy. The team suggests this is due to the unique and unique approach each program uses to create videos.

“Videos are generated using different strategies and generator architectures,” the researchers wrote. “Each technique provides a significant trace, making it much easier for the network to accurately distinguish between each generator.”

simple survey

When the program faced the challenge of discovering a completely new generator without first publishing at least a small amount of video from that generator, it struggled, but with a small amount of tweaking, MISLnet It learned quickly and was able to identify with 98% accuracy. This strategy, called “few-shot learning,” is an important feature. New AI technologies are created every day, so detection programs need to be agile enough to adapt with minimal training.

“We've already seen AI-generated videos being used to generate misinformation,” Stam says. “As these programs become more popular and easier to use, we can expect to see a flood of synthetic videos. Detection programs should not be the only line of defense against misinformation, but a commitment to information literacy is key. However, having the technical capacity to verify the authenticity of digital media is certainly an important step.”

This research was funded by DARPA, the Air Force Research Laboratory (AFRL), and the National Science Foundation (NSF). In addition to Stamm and Nguyen, engineering doctoral students Danial Samadi Vahdati and Aref Azizpour contributed to the paper.

Read the full paper here: https://ductai199x.github.io/beyond-deepfake-images/

Source link