Meta Announces Voicebox, a Generative AI Model for Speech Generation

Meta claims to have achieved breakthroughs in speech generation AI using Voicebox.

Meta voicebox for speech generation While normal generative AI models generate images from text prompts, Voicebox generates high-quality audio clips. (Image: Meta)

listen to this article

Your browser does not support the audio element.

Generative AI is making strides in many aspects of content creation. Now, technology giant Meta has introduced generative AI models for speech-related tasks. The company announced his Voicebox, a super tool to help you edit, sample and style audio. Voicebox is a kind of technology that helps content creators with various tasks, helps blind people hear written messages, and allows people to speak in any foreign language.

The company claims to have achieved a breakthrough in speech generation AI. “We have developed Voicebox, the first model that can be generalized to speech generation tasks that have not been specifically trained to achieve state-of-the-art performance,” the company wrote in its blog.

Voicebox creates outputs in different styles and lets you create from scratch. While normal generative AI models generate images from text prompts, Voicebox generates high-quality audio clips. Currently, the model can process audio in his six languages and perform tasks such as noise removal, content editing, diverse sample generation, and style conversion.

Meta also said that multi-purpose generative AI models like Voicebox can render natural-sounding voices to virtual assistants and NPCs in the metaverse. This model comes with in-context text-to-speech synthesis, allowing Voicebox to generate text-to-speech from audio samples as short as 2 seconds, tailored to your audio style.

This model can recreate parts of speech interrupted by noise and replace mispronounced words without rerecording the speech. Voicebox can generate speech from text in French, Spanish, English, German, Polish, and Portuguese from samples of human voices. This feature is known as cross-language style transfer. “This feature could be used in the future to help people communicate in a natural and authentic way, even if they don’t speak the same language.”

Additionally, with its diverse audio sampling, the tool can generate audio that reflects how people speak in the real world.

Date first published: Jun 17, 2023, 13:42 IST

Source link

شركة مكافحة حشرات بجازان commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: Hocam Ellerinize Saglık Güzel Makale Olmuş Detaylı
Leila Branch commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: Enter a world of pure imagination and fun. https:/
Najlepszy kod polecajacy Binance commented on Insights from Nabil Batawi, Group CHRO, Alkhorayef Group, KSA, ETHRWorldME: Your point of view caught my eye and was very inte
Parker Robinson commented on AI platform Hugging Face says hackers have stolen authentication tokens from Spaces: Bitcoin Mining for Passive Income in 2026 https://
100 USDT commented on How to Make AI Work for You, at Work: Thanks for sharing. I read many of your blog posts

Meta Announces Voicebox, a Generative AI Model for Speech Generation

Meta claims to have achieved breakthroughs in speech generation AI using Voicebox.

Leave a Reply

RECENT POSTS

OpenAI powers drug discovery AI

Bank of America accepts less than 1% of summer internship applications

Alarm Detection Systems introduces SEEKER Remote Video Monitoring, an AI-powered platform that provides proactive real-time security

Meta claims to have achieved breakthroughs in speech generation AI using Voicebox.

Related Posts

Leave a Reply