OpenAI's delay in releasing a superior voice mode for ChatGPT upset many of the AI chatbot's fans, but now they may have gotten ahead of themselves: French artificial intelligence developer Kyutai has announced a real-time voice AI assistant called Moshi.
Moshi is designed to have lifelike conversations with users through voice, like Alexa or Google Assistant, but it's powered by a large language model (in this case the Helium 7B model) that underpins ChatGPT and its competitors. Kyutai says Moshi can speak in a variety of accents and has 70 different emotions and ways of speaking. The AI can also process two audio streams simultaneously, allowing Moshi to listen and speak at the same time.
Kyutai's development of Moshi included fine-tuning more than 100,000 synthetic lines of dialogue created using text-to-speech (TTS) technology. The goal was to teach Moshi the nuances and tones of human communication. The brand worked with professional voice actors to improve Moshi's voice quality.
The AI assistant integrates both text and voice training and is optimized for multiple backends, allowing it to run on devices like laptops without interacting with the cloud. The company pitches this as a way to maintain privacy and security by preventing sensitive data from being sent over the internet. You can see a demo of Moshi here.
Open Talk
Kyutai has declared that Moshi, including the model's code and framework, will be an open-source project, providing a foundation for further innovation. An open-source approach could also help alleviate complaints that major AI companies have about the safety and ethics of closed models. Kyutai's backers, including French billionaire Xavier Niel, are backing the open-source approach.
Kyutai is also working on AI audio identification, watermarking, and signature tracking systems to incorporate into Moshi. These features will help identify AI-generated audio and enable monitoring and verification of AI-generated content while promoting accountability and traceability.
Moshi is still in development, but the voice mode of the presentation is impressive. The voice approach could inspire voice-enabled versions of ChatGPT's competitors, and if Moshi becomes widespread and popular, it could accelerate the addition of LLM to Alexa and other voice assistants.
If you want to try Moshi out, a demo is available online and you can also sign up for early access to the full chatbot.