The new model of Chariture.ai uses images and audio to generate interactive videos

What is the story

Character.ai, a Google-owned startup, has launched a cutting-edge AI model called TalkingMachines. Innovative technology allows users to generate interactive videos simply by providing images and audio inputs. This new feature features a diffusion transformer (DIT) that uses asymmetric knowledge distillation to convert high-quality, bi-directional video models into a high-speed real-time generator.

Technological advances

How the TalkingMachines model works

The TalkingMachines model listens and animates audio parts of the character's face, such as the mouth, head, and eyes. This is done in sync with all words, pauses, and intonations. Character.ai uses a custom 1.2B parameter audio module that can capture audio and silence. The company claims that the new model can produce high-quality videos without compromising consistency or image quality. It supports a wide range of styles, from photorealistic humans to anime characters and 3D avatars.

User Protection

Calls functions for voice conversation

In response to past criticisms about user safety, Charition.AI has introduced a new supervision tool to protect users under the age of 18. The company is constantly adding new features such as AvatArfx, Scenes, Streams and more. After Openai's Advanced Voice mode, the startup introduced a call feature that allows users to have audio conversations with selected characters, enhancing engagement.

Source link