When Synthesia was launched in 2017, its main purpose was to, for example, call the AI version of former soccer player David Beckham, a voice speaking in various languages. A few years later, in 2020, they began giving companies that registered for their services the opportunity to create professional-level presentation videos starring either the AI version of staff members or the consenting actors. But the technology wasn't perfect. The avatar's body movements were jerky and unnatural, with the accents slipping off from time to time, and the emotions shown in the voice always did not match the facial expressions.
Currently, Synthesia's avatars are updated with more natural manners and movements, as well as expressive voices that better maintain the accent of the speaker. For Synthesia's corporate clients, these avatars can be slippery presenters of financial results, internal communications, or staff training videos.
I found a video showing my avatar getting uneasy as it is technically impressive. It's smooth enough to pass as a high resolution record of creepy corporate speeches, and if you didn't know me, you probably think that's exactly what it is. This demonstration shows how difficult it is to distinguish artificiality from reality. And soon, these avatars can even talk to us. But how much can they get better? And what does interacting with AI clones do to us?
Creation Process
When my former colleague Melissa visited Synthesia's London Studio last year to create her own avatar, she had to go through a lengthy process of proofreading the system, reading scripts for various emotional states, uttering the sounds necessary to form the avatar's vowels and consonants. Standing in a brightly lit room 15 months later, I was relieved to hear that the creation process was greatly streamlined. Synthesia's technical supervisor Josh Baker-Mendoza encourages you to gesture and move your hands the same way you do during natural conversations, but at the same time he warns you not to move too much. I formally repeat an overly shining script designed to encourage emotional and enthusiastic talk. As a result, it appears that Steve Jobs has revived as a blonde British woman with a low, monotonous voice.
It also has the unfortunate effect of sounding like a Synthesia employee. “I'm so excited to be with you to show off what we've been working on today. We're on the edge of innovation and the possibilities are endless.” “So prepare to be part of something that lets you go, 'Amazing!” This opportunity isn't just a big one, it's monumental. ”
Just an hour later, the team has all the footage they need. A few weeks later I will receive my two avatars. One is equipped with the previous Express-1 model, while the other is made with the latest Express-2 technology. The latter is what Synthesia claims, so that synthetic humans become more realistic and faithful to modeled people with more expressive hand gestures, facial movements and speech. You can see your results below.
Composition of courtesy
Last year, Melissa discovered that her Express-1-powered avatar failed to rival her transatlantic accent. The range of emotions was also limited. When she asked Avatar to read the script angrily, it sounded more capricious than ferocious. In the next few months, Synthesia has improved the Express-1, but my version of my avatar, made with the same technology, blinks wildly, and yet I have a hard time synchronizing my body movements with my speech.
In contrast, I was impressed by how much my new Express-2 avatar looks like me. The facial features perfectly reflect my own self. The voice is spire-correct, gestures more than I do, but the hand moves generally marry what I am saying.
