Watch and cry (or laugh): Synthesia's AI video avatars now include emotions

Generative AI has captured the public's imagination with breakthroughs in creating elaborate, plausible, and authentic text and images from verbal prompts. But the problem, and it's a common one, is that if you look closely, the results are often far from perfect.

People point out strange fingers, floor tiles slip and math problems are just that. The problem is that sometimes the numbers don't add up.

Synthesia, one of the more ambitious AI startups currently working on videos, specifically custom avatars designed for business users to create promotions, training, and other enterprise video content, has identified some of its challenges. We're releasing an update in hopes of helping you jump through the cracks. specific field. Its latest version features avatars built based on real humans shot in a studio, with more emotion, better lip tracking, and more expressive content when you input text to generate videos. It is said that it can express human movements in a natural way.

This release comes on the heels of some impressive progress for the company to date. Unlike other generative AI players like OpenAI, the company plans to significantly increase public awareness with consumer-facing tools like ChatGPT, while also building B2B services whose APIs will be used by independent developers and large corporations. We are building a core strategy, but Synthesia is starting to tilt. It builds on the approach taken by other prominent AI startups.

Just as Perplexity is focused on truly solid generative AI search, Synthesia is focused on truly solidly building a way to build the most human-like generative video avatars possible.More specifically, I'm trying to achieve this only Ideal for business markets and use cases such as training and marketing.

This focus helps Synthesia stand out in a very crowded AI market that risks becoming commoditized once the hype settles into longer-term concerns such as ARR, unit economics, and operational costs associated with AI implementation. became.

Synthesia describes the new Expressive Avatar, the version released Thursday, as the first of its kind, describing it as “the world's first avatar completely generated by AI.” Synthesia is built on large-scale pre-trained models, and its breakthrough lies in the way they are combined to achieve multimodal distributions that more closely mimic real human speech. states that there is.

According to Synthesia, these are generated on the fly and are intended to approximate the experiences we have when speaking and reacting in our lives. This is in contrast to the behavior of many avatar-based AI video tools today. Typically, these actually quickly stitch together a number of videos to create a facial reaction that more or less matches the input script. they. The goal is to look less robotic and more authentic.

Previous version:

New version:

CEO Victor Riparbelli himself acknowledges that there's still a long way to go, as the two examples here (an older version of Synthesia and the version released Thursday) show.

“Of course, we're not 100% there yet, but we'll be there very soon, by the end of the year. That would be pretty amazing,” he told TechCrunch. “You can also see that the AI part of this problem is very subtle. In humans, there is so much information in very small details, such as facial muscle movements. I don't think you'll ever be able to sit down and explain, “Sure, I smile like this when I'm happy, but it's fake, right?'' It's very complicated for humans, but it's possible. [captured in] deep learning network. They can actually see patterns and reproduce them in a predictable way. ''The next thing we're working on is the hands, he added.

“My hands are really hard,” he says.

The B2B focus will also help Synthesia focus its messaging and products more on the use of “secure” AI. This is essential, especially today when there are significant concerns about deepfakes and the use of AI for malicious purposes such as misinformation and fraud. Still, Synthesia hasn't completely avoided controversy on that front. Synthesia's technology has previously been misused for propaganda in Venezuela and for creating false news reports from pro-China social media accounts.

The company noted that it had taken further steps to limit its use. Last month, the company updated its policies to “restrict the types of content people can create, invest in early detection of malicious actors, increase the team working on AI safety, and strengthen content authentication technologies such as C2PA.” We will experiment.” ”

Despite these challenges, the company has continued to grow.

The last time Synthesia raised $90 million, it was valued at $1 billion. Notably, this funding took place almost a year ago, in June 2023.

Riparbelli said in an interview earlier this month that there are no plans to raise more money at this time, but he did not substantively answer the question of whether Synthesia is actively approaching it. (Note: We're really looking forward to having a real-life Riparbelli speak at our event in London in May. We'll definitely be asking about this again. Please come visit us if you have the opportunity.)

What we know for sure is that building and running AI costs a lot of money, and Synthesia has done a lot of the building and running.

The company said about 200,000 people had created more than 18 million video presentations in about 130 languages using Synthesia's 225 legacy avatars ahead of the release of Thursday's version. (It doesn't say how many users are in the paid tier, but it has a number of high-profile customers, including Zoom, BBC, DuPont, and others, and companies are actually paying the fees.) Of course, the startup's hopes are It is as follows. The number will increase even more as new versions are released.

Source link