Kyutai Labs launches Moshi AI chatbot with real-time voice capabilities as a rival to GPT-4o

Kyutai Labs on Wednesday released Moshi AI, an artificial intelligence (AI) chatbot that responds with voice in real time. The French AI company said it developed Moshi's entire voice language model in-house. It can also adjust its voice to express emotions and respond in different speaking styles. The AI model is free to access for the public. Currently, the AI model limits conversations to five minutes. Interestingly, OpenAI also announced a similar voice feature with the release of GPT-4o, but it has not yet been released.

Moshi AI Features

The company said the AI model was developed over six months by a team of eight people. Kyotai Labs, which unveiled the AI model at an event in Paris, said Moshi is not an AI assistant, but a prototype that can be used to develop tools for various use cases. The company also made the chatbot publicly available here. Users can join the queue by entering their email address, but Gadgets 360 staff were able to access the platform immediately with no wait time.

Yesterday we introduced Moshi, the lowest latency conversational AI ever. Moshi can chat, explain different concepts, and role-play with different emotions and speaking styles. Talk to Moshi at https://t.co/a4EbAQiih7 and learn more about how to: pic.twitter.com/NkJRybTRLQ

— Kyutai (@kyutai_labs) July 4, 2024

The platform's interface is very simple. It has a simplified AI design that lets users see how loud their voice is when they speak. There's a text box that only displays the AI's response. Another box near the top displays technical details like audio length, delay, and any missing audio.

There's a hang up button at the top. Currently, calls last up to five minutes. The description page emphasizes that Moshi can think, speak, and listen simultaneously to maximize conversation flow.

Gadgets 360 found that latency was very low, with the AI often responding instantly, although there were a few cases where the response time delay exceeded 10-15 seconds, which could be attributed to heavy server load, but there were also occasions when voice prompts were not recognized at all, even after the volume meter reached three-quarters full.

Moshi AI Interface
Photo credit: Kyutai Labs

Gadgets 360 also found that the AI model can respond with an emotive voice and speak in different styles and using different voice modulations. The AI model is also connected to the internet and can retrieve responses to queries that require web searches. Of note, the chatbot does not allow text prompts and voice is the only means of interacting with the chatbot.

Kyutai Labs announced that the AI model will be open-sourced, however, the AI company has not yet hosted the model weights and code on its portal. Once available, users will be able to download and install it locally and run it on disconnected devices.

For the latest tech news and reviews, follow Gadgets 360. XFacebook , WhatsApp , Threads , Google News . For the latest videos on gadgets and tech, subscribe to our YouTube channel . To know all about top influencers, follow our Who'sThat360 on Instagram and YouTube .

Lava Blaze X 5G price range leaked ahead of India launch, expected to be powered by MediaTek Dimensity 7050 SoC