Meta Announces Voicebox, a Generative AI Model for Speech Generation

AI News


Meta claims to have achieved breakthroughs in speech generation AI using Voicebox.

Meta voicebox for speech generationWhile normal generative AI models generate images from text prompts, Voicebox generates high-quality audio clips. (Image: Meta)

listen to this article
Your browser does not support the audio element.

Generative AI is making strides in many aspects of content creation. Now, technology giant Meta has introduced generative AI models for speech-related tasks. The company announced his Voicebox, a super tool to help you edit, sample and style audio. Voicebox is a kind of technology that helps content creators with various tasks, helps blind people hear written messages, and allows people to speak in any foreign language.

The company claims to have achieved a breakthrough in speech generation AI. “We have developed Voicebox, the first model that can be generalized to speech generation tasks that have not been specifically trained to achieve state-of-the-art performance,” the company wrote in its blog.

Voicebox creates outputs in different styles and lets you create from scratch. While normal generative AI models generate images from text prompts, Voicebox generates high-quality audio clips. Currently, the model can process audio in his six languages ​​and perform tasks such as noise removal, content editing, diverse sample generation, and style conversion.

Meta also said that multi-purpose generative AI models like Voicebox can render natural-sounding voices to virtual assistants and NPCs in the metaverse. This model comes with in-context text-to-speech synthesis, allowing Voicebox to generate text-to-speech from audio samples as short as 2 seconds, tailored to your audio style.

This model can recreate parts of speech interrupted by noise and replace mispronounced words without rerecording the speech. Voicebox can generate speech from text in French, Spanish, English, German, Polish, and Portuguese from samples of human voices. This feature is known as cross-language style transfer. “This feature could be used in the future to help people communicate in a natural and authentic way, even if they don’t speak the same language.”

Additionally, with its diverse audio sampling, the tool can generate audio that reflects how people speak in the real world.

© IE Online Media Services Pvt Ltd

Date first published: Jun 17, 2023, 13:42 IST





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *