This told essay is based on a conversation with Hany Farid, a digital forensics expert and former professor at the University of California, Berkeley. Mr. Farid is also the co-founder of GetReal, a digital forensics and cybersecurity startup. Edited for length and clarity.
The average person on the internet today cannot tell if an image, video, or audio recorded in their feed is real.
We did a perceptual study on this. The human visual and auditory systems are not sufficient to perform this task reliably.
That doesn’t mean you can’t say it. We have calculation tools and math tools. Can you give me some content and some time? Yes, I’ll figure it out.
But there’s a big difference between what we do at GetReal, the digital forensics startup I co-founded, and what the average person doomscrolling on social media can do.
I started studying digital forensics at a time when the field was almost non-existent.
I began my academic career at Dartmouth College in 1999.
It’s hard to remember 1999. We lived in an almost analog world. We were still taking pictures with film. Digital cameras began to appear. The Internet was emerging, but there was very little. Social media didn’t really exist.
No one knew where this was going. I started thinking about digital evidence in court, which is inherently flexible. At the time, no one thought this was a problem, and they were right. I thought it might happen, because the digital revolution doesn’t seem to be stopping. So we started this very bespoke, niche, small, weird field called digital forensics. The paper was written by just me and a few talented graduate students from Dartmouth College. Everyone was like, “This is cool, but what does this have to do with anything?”
Then digital became popular. A citizen journalist appeared. We started seeing Associated Press and Reuters saying, “How do we know this photo that someone sent us is real?”
Over the years, the issue has expanded from monthly media and court hearings and national security hearings once a year to daily.
Suddenly, our entire world is turned upside down.
In the early days, fake images often left clues
When I first started in my field, I was mainly thinking about photography. It was very difficult to manipulate the video. It’s 24-30 frames per second and has an audio track. Tools like Photoshop have made it easy to manipulate images.
Fortunately, it still required skill to operate. Then you will find the mistake. You will find an artifact. There were misplaced shadows, wrong geometry, and wrong sizes. There may be metadata indicating that the photo was edited in Photoshop.
No skills are required today. No time required. You don’t need anything. All you need is a keyboard and an internet connection. You can type in, “Do this for this image, audio, or video,” and the AI will take over and do amazing things, things that were unimaginable 5-10 years ago.
AI-generated content is becoming visually indistinguishable
With any technology, you don’t have to see where the puck is. Let’s see where the puck goes.
We knew we would reach a point where the content would become visually indistinguishable. They are not necessarily computationally indistinguishable, but visually they become indistinguishable.
Images were probably the first to cross the uncanny valley. Then there was the voice, with inflections, laughter, and pauses.
The footage is now running through the uncanny valley. If someone gives me 30 minutes of HD video, it’s probably not AI. But when it’s 15 to 30 seconds long (a typical video seen online), it’s hard to tell based on visual cues alone. For now.
AI-generated videos used to be around 4 seconds long. Now some of them can be strung together to reach 30 or 40 seconds.
The content will continue to improve. It will be cheaper and easier to use. And it will become ubiquitous.
You can investigate fakes, but your internet speed will be faster
Generative AI knows nothing about the 3D world. I don’t know about physics or shadows. I say “I know” in air quotes.
AI can produce things that are very good for the human brain. But the physics are slightly wrong.
As long as you are doing something physically impossible, we have a signal that we can detect.
In some cases, spotting a fake can be very quick and relatively easy. If you find something wrong, that’s it.
Conversely, authenticating something is much more difficult. No matter how many tests I run, I can’t find anything wrong. Does that mean it’s real? not much. means nothing was found.
On average, the task may take about an hour. But an hour is a long time on the internet. It is eternal in nature.
Usually I get contacted about something and it already has a million views. We will work on it, talk to reporters and they will write a report. It now has 10 million views.
We’re a bit of a post-hoc on this point. Fact checking comes after the facts.
What scares me most is that we are losing our common sense of reality.
The risks and consequences of making mistakes are increasing. You’re putting people in jail. You are making geopolitical decisions. You’re reporting on what’s going on in the world and trying to inform 8 billion people. Must be understood correctly.
My biggest fear is that we, as a society, are losing our common sense of reality.
We are not arguing about what tax rates should be, what the role of government is, what the role of foreign policy is, or any other disagreement.
We are debating whether 2 plus 2 is 4. I say, “Two plus two is four,” and the other person says, “No, it’s not. It’s applesauce.”
That’s the gist of the conversation.
I don’t see how we can maintain a stable democracy without a shared understanding of reality. We cannot agree. that’s ok. Differences of opinion are good. We can’t say, “This happened” and have the other person say, “No, it didn’t happen.”
That can’t be the way our society works.
