Meta claims to have achieved breakthroughs in speech generation AI using Voicebox.
Generative AI is making strides in many aspects of content creation. Now, technology giant Meta has introduced generative AI models for speech-related tasks. The company announced his Voicebox, a super tool to help you edit, sample and style audio. Voicebox is a kind of technology that helps content creators with various tasks, helps blind people hear written messages, and allows people to speak in any foreign language.
The company claims to have achieved a breakthrough in speech generation AI. “We have developed Voicebox, the first model that can be generalized to speech generation tasks that have not been specifically trained to achieve state-of-the-art performance,” the company wrote in its blog.
Voicebox creates outputs in different styles and lets you create from scratch. While normal generative AI models generate images from text prompts, Voicebox generates high-quality audio clips. Currently, the model can process audio in his six languages and perform tasks such as noise removal, content editing, diverse sample generation, and style conversion.
you’re exhausted
Limited number of free calls per month.
To read more,
Simply register or sign in
subscribe and read more
Please select a plan
all access
access to premium story
digital only
access to premium story
This premium article is free for now.
Subscribe to continue reading this story.
This content is for subscribers only.
Subscribe for unlimited access to premium articles exclusive to The Indian Express.
This content is for subscribers only.
Subscribe now for unlimited access to premium articles exclusively from The Indian Express.
Meta also said that multi-purpose generative AI models like Voicebox can render natural-sounding voices to virtual assistants and NPCs in the metaverse. This model comes with in-context text-to-speech synthesis, allowing Voicebox to generate text-to-speech from audio samples as short as 2 seconds, tailored to your audio style.
This model can recreate parts of speech interrupted by noise and replace mispronounced words without rerecording the speech. Voicebox can generate speech from text in French, Spanish, English, German, Polish, and Portuguese from samples of human voices. This feature is known as cross-language style transfer. “This feature could be used in the future to help people communicate in a natural and authentic way, even if they don’t speak the same language.”
Additionally, with its diverse audio sampling, the tool can generate audio that reflects how people speak in the real world.
© IE Online Media Services Pvt Ltd
Date first published: Jun 17, 2023, 13:42 IST