AI detects deepfake video fingerprints

summary: New research highlights challenges and advances in AI-generated video detection. The researchers found that traditional digital media detection methods do not work well for videos created by AI, such as those created by OpenAI's Sora generator.

By employing machine learning algorithms, the team was able to identify the unique digital “fingerprints” left behind by different AI video generators. This development is critical because AI-generated content can be used for misinformation, requiring robust detection techniques to maintain media integrity.

Important facts:

Traditional synthetic image detectors struggle with AI-generated videos and are significantly less effective compared to manipulated images.
Drexel's team has developed a machine learning approach that can be adapted to recognize the digital traces of a variety of AI video generators, including some that are not yet publicly available.
Machine learning models can achieve up to 98% accuracy in identifying synthetic videos after minimal exposure to new AI generators.

sauce: drexel university

In February, OpenAI released a video created by its generative artificial intelligence program Sora. Stunningly realistic content generated by simple text prompts is the latest breakthrough for companies demonstrating the power of AI technology.

It also raised concerns about the potential for generative AI to create misleading or deceptive content at scale.

This shows a computer-generated face. — This makes the program adept at both identifying deepfakes from known sources and identifying deepfakes created by previously unknown programs.Credit: Neuroscience News

Current methods for detecting altered digital media are ineffective against videos generated by AI, according to a new study from Drexel University. However, machine learning approaches may hold the key to uncovering the true nature of these synthetic creations.

In a paper accepted for presentation at the IEEE Computer Vision and Pattern Recognition Conference in June, researchers from Drexel University's School of Engineering's Multimedia and Information Security Laboratory argue that existing synthetic image detection techniques have so far I explained that it was failing to detect the video. has had success with machine learning algorithms that can be trained to extract and recognize the digital “fingerprints” of various video generators, including Stable Video Diffusion, Video-Crafter, and Cog-Video.

We also showed that the algorithm can learn how to detect new AI generators after learning just a few examples of videos.

Dr. Matthew Stam, associate professor of engineering at Drexel University and director of MISL, said, “It is alarming that this video technology could be released before we have a good system for detecting fakes created by malicious parties.'' “I'm a little worried,” he said.

“Responsible companies will do their best to embed identifiers and watermarks, but once this technology is released to the public, people who want to use it for deception will find a way to do so. We are working to stay ahead of the media by developing technology that identifies synthetic videos based on their patterns and features.”

deepfake detective

For more than a decade, Stam's lab has been active in efforts to issue warnings about digitally altered images and videos, which have seen the editing techniques used to spread political misinformation. , last year was especially busy.

Until recently, these operations were the product of photo and video editing programs that added, removed, or shifted pixels. Or you can slow down, speed up, or clip out video frames. Each of these edits leaves a unique digital trail of his breadcrumbs, so Stamm's lab has developed a suite of tools tailored to find and track them.

The institute's tools use advanced machine learning programs called constrained neural networks. This algorithm, in a similar way to the human brain, determines what is “normal” and what is “abnormal” at a sub-pixel level in an image or video, rather than searching for specific predetermined identifiers of the operation from the beginning. You can learn about

This makes the program adept at both identifying deepfakes from known sources and identifying deepfakes created by previously unknown programs.

Neural networks are typically trained on hundreds or thousands of examples to get a very good understanding of the differences between unedited and manipulated media. This can include anything from changes between adjacent pixels to the order of the spacing of frames within a frame. Affects the size and compression of the video, the file itself.

new challenge

“When you create an image, the physical and algorithmic processing of the camera introduces relationships between different pixel values that are very different from the pixel values that would occur if you generated the image in Photoshop or AI,” Stamm said. says.

“But these days, we're seeing text-to-video generators like Sora that can create really impressive videos. These aren't created with a camera or Photoshop, so they bring a whole new set of challenges. ”

A campaign ad circulating last year in support of Florida Governor Ron DeSantis depicted former President Donald Trump hugging and kissing Anthony Fauci, the first person to use generative AI technology.

This means the video was created entirely by an AI program, rather than being edited or spliced together from other videos.

Stamm also points out that when no edits are made, there are no standard clues, creating unique detection problems.

“Until now, forensic detection programs have simply treated the edited video as a series of images and applied the same detection process,” Stamm said.

“However, in the case of AI-generated video, there is no evidence that the images have been manipulated frame by frame, so for a detection program to be effective, new You need to be able to identify the traces.”

In the study, the team tested 11 publicly available synthetic image detectors. Both of these programs were very effective at identifying manipulated images, with at least 90% accuracy. However, when faced with picky videos created by public AI generators Luma, VideoCrafter-v1, CogVideo, and Stable Diffusion Video, performance dropped by 20-30%.

“These results clearly demonstrate that synthetic image detectors have considerable difficulty detecting synthetic videos,” the researchers wrote. “This finding is consistent across multiple different detector architectures and when the detector is pre-trained by others and re-trained using our dataset.”

reliable approach

The researchers speculated that similar to the MISLnet algorithm, convolutional neural network-based detectors could be successful on synthetic videos. This is because the program is designed to constantly change its learning as it encounters new examples. This makes it possible to recognize new and evolving forensic traces.

Over the past few years, the team has demonstrated MISLnet's acuity in finding manipulated images using new editing programs, including AI tools. So it was a natural step to test it against synthetic video.

“We used the CNN algorithm to reliably detect deepfakes in manipulated images, videos, and audio, and were reliably successful,” said Tai, a PhD student at MISL and co-author of the paper. D. Nguyen said.

“Due to its ability to adapt to small amounts of new information, we thought it could also be an effective solution for identifying AI-generated synthetic videos.”

For this test, the group trained eight CNN detectors, including MISLnet, using the same test dataset used to train the image detectors. This includes real videos and AI-generated videos generated by four publicly available programs.

We then tested the program against a set of videos, including videos created by generative AI programs (Sora, Pika, and VideoCrafter-v2) that are not yet publicly available.

By analyzing small portions (patches) of single frames in each video, a CNN detector can learn what a synthetic video looks like at a detailed level and apply that knowledge to a new set of videos. It's done. Each program was more than 93% effective in identifying synthetic videos, with MISLnet showing the best performance with his 98.3%.

The program is slightly more effective at performing whole-video analysis by randomly extracting a few dozen patches from different frames of a video and using them as mini-training sets to learn new video features. was. Using a set of 80 patches, the accuracy of the program was between 95 and 98%.

With a little extra training, the program was able to identify the program used to create the video with over 90% accuracy. The team suggests this is due to the unique and unique approach each program uses to create videos.

“Videos are generated using different strategies and generator architectures,” the researchers wrote. “Each technique provides a significant trace, making it much easier for the network to accurately distinguish between each generator.”

simple survey

When the program faced the challenge of discovering a completely new generator without first publishing at least a small amount of video from that generator, it struggled, but with a small amount of tweaking, MISLnet It learned quickly and was able to identify with 98% accuracy.

This strategy, called “few-shot learning,” is an important feature. New AI technologies are created every day, so detection programs need to be agile enough to adapt with minimal training.

“We've already seen AI-generated videos being used to generate misinformation,” Stamm says.

“As these programs become more popular and easier to use, we can expect to see a flood of synthetic videos. Detection programs should not be the only line of defense against misinformation, but a commitment to information literacy is key. However, having the technical capacity to verify the authenticity of digital media is certainly an important step.”

About this AI and deepfake detection research news

author: brit fallstick
sauce: drexel university
contact: Britt Fallstick – Drexel University
image: Image credited to Neuroscience News

Original research: The results of this research will be presented at the IEEE Computer Vision and Pattern Recognition Conference.

Source link