“AI Accent” may be the easiest way to tell an AI video — this is what it looks like

AI Video & Visuals


Do you know what artificial intelligence sounds like? Studies show that when asked to guess, most people can’t tell the difference between AI-generated speech and real human conversation.

This confusion can have dire consequences for how we see the world. When we are confused about what is real and what is not on our screens, we can begin to believe false information, or worse, racist stereotypes about the people depicted in AI-generated videos.

But there may be one surefire way to explore what AI is all about, especially with video. It’s about listening to people’s voices.

Various AI experts shared telltale signs of why voices and sounds in AI videos may reveal their synthetic origins. Here’s how:

The AI ​​voices in Sora videos often sound like they've downed five cups of coffee.

The AI ​​voices in Sora videos often sound like they’ve downed five cups of coffee. Illustration: Huffington Post; Photo: Getty

Listen to the sounds of overcaffeination.

Real humans have a natural rhythm in their speech, and some words are spoken more slowly than others. However, AI voices often sound unnatural and rushed.

Jeremy Carrasco, a video expert who debunks AI videos on social media, said he noticed that videos from Sora, an artificial intelligence video app owned by OpenAI, often had an “overly energetic” nature. “They’re saying a lot, but they’re not saying much at all, they’re just stuffing words,” he said.

Even OpenAI recognizes this tell-tale sign. Too many em dashes in text answers are known to cause problems in OpenAI’s ChatGPT answers, and can reveal when someone’s cover letter or first date message was generated by AI.

In October, the host of the video streaming show TBPN asked Sora head Bill Peoples what “their dash” was. [AI] His immediate reaction in the video interview was telling.

“I think ’em-dash’ at this point is this slightly wired speech pattern of Sora, who likes to say a lot of words quickly,” Peoples said.

Be careful of garbled or unclear audio.

What we call the rhythm of someone’s speech is what linguists call “articulation,” or how our voice physically moves from one sound to another as air passes through the nose and out of the mouth. And many AI-generated voices still struggle with this, producing garbled sounds that appear to have a flattened pitch on natural sounds.

“Humans would never produce this same kind of disturbed quality.” [as an AI-generated voice]”Because we literally can’t do that. Our vocal tracks can’t go from one note to another without some blurring of the information between the two notes,” said Melissa Bays-Burke, a professor of linguistics at the University of Chicago.

Baese-Berk gave the example of an AI subway dating video where a woman meets a man she immediately calls “husband.” The video fooled many people into believing it was real. But when a woman says “husband,” the “band” part of the word “sounds very strange,” she says. The “band” part of the word “lacks the natural articulatory information that occurs as it moves from the tip of the tongue to the lips,” Baese-Berk said.

“Only a robot can transmit those sounds from the tongue to the lips without any mashup,” Baese-Berk says.

This inhuman combination of words is by design.

“Text-to-speech models are trained to predict the most likely pronunciation of words in sequence, but they often struggle to smoothly blend the sounds that connect words,” said Miguel Jett, vice president of AI at Rev, a text-to-speech service. “For example, even when humans naturally say ‘didja’ instead of ‘did you,’ AI tends to overpronounce each word or mix them together too abruptly.”

Be careful of incorrect pronunciation.

Jette says if there’s a word that’s obviously mispronounced, that could also be a sign. “AI voices can struggle with unusual or unique words that don’t appear in the training data.”

For example, Carrasco said he observed that Google’s text-to-video Veo model “may not be packed with as many words, but the words will be out of order or the wrong person will say something.”

Be aware if your emotional response doesn’t match the video’s story.

In a 2025 study that asked participants to rate which voices were AI or not, AI voices created by a text-to-speech model were only correctly identified 55% of the time. The biggest mistake came with the angry-sounding AI voice.

Camilla Bruder, a co-author of the study and a researcher at the Max Planck Institute for Experiential Aesthetics, said this may be because participants expected the AI’s voice to sound like a robot.

In practice, AI voices are often too emotional for what the scene requires. If the AI’s voice is “too stereotypical, a happy, ‘Wow!’ kind of voice,” Bruder said those characteristics could indicate the video is AI.

Carrasco said people should also be careful if what’s being said has a strange emotional response. Take one viral AI video of fish falling from the sky. “It’s a fish, it’s a fish!” the woman in the video exclaims.

“They’re just narrating what’s happening on the screen. They wouldn’t do that in real life,” Carrasco said of the video. “If there were a lot of fish in the rain [down]you’ll probably say, “Oh my god.” ”

Compare the AI’s inappropriate emotions to the real-life fear recently experienced by a truck driver in Kentucky who witnessed a plane crash in front of his eyes. In this video, the driver doesn’t talk about his experience, he just opens his mouth. “He just can’t believe it,” Carrasco said. “This kind of thing happens a lot in the real world.”

Another option is to simply look at people’s mouth movements for clues. “The visual perks of these videos are just as revealing as the audio,” Jette says. “If the speaker’s lips are not perfectly in sync with the voice, that’s a strong indicator.”

These clues are helpful, but not always guaranteed.

Of course, these clues aren’t always a surefire way to uncover AI-generated audio. Eleven Lab, an AI lab that creates clones of real voices, is good at adding voice frying and human poses, so even if you hear a voice speaking without breathing, “you can’t always tell” it’s AI, Bruder said.

But overall, these telltale signs are strong indicators that the video you’re watching was probably created by a machine. And that’s a useful start. As AI continues to evolve at breathtaking speed, we need all the help we can get to understand what’s fake and what’s not.

“If something feels wrong, it probably is,” Jette says. “A healthy skepticism and a good eye and ear for detail go a long way.”

Related…

Read the original article on HuffPost





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *