New open source tool from ElevenLabs lets you add sound effects to any video

AI Video & Visuals

It's time to celebrate the incredible women leading the way in AI. Nominate your incredible leader for VentureBeat's Women in AI Awards by June 18. Learn more here


A few weeks after AI voice startup ElevenLabs released its text-to-speech AI service, Sound Effects, the company has released an open-source tool that shows off the possibilities. The application lets creators generate sound effect samples for their videos in “about 15 seconds,” analyzing imported clips and offering multiple options.

Developers can access the app's code on GitHub, but ElevenLabs has also launched a website where the public can try out the Sound Effects API.

Once you upload a video, the so-called “Video to Sound Effects” app extracts four frames, spaced one second apart, on the client side. It then sends those frames and a prompt to OpenAI's GPT-4o to create a custom text-to-sound effects prompt. It then uses that prompt to generate a sound effect through ElevenLabs' Sound Effects API. Finally, the video and audio are combined on the client side to create a single downloadable file up to 22 seconds long.

“We see this as a proof of concept of what our SFX API can do,” Ammaar Reshi, design lead at ElevenLabs, told VentureBeat. “AI video creators are often searching for the perfect sound effect, so we felt we could intelligently speed up their workflow by understanding the frames of the video and suggesting the best output.” He said the company is excited about the variety of dynamic experiences people can build with the API, citing immersive video games as an example where sound could be generated based on player interactions.


VB Transform 2024 registration opens

Join enterprise leaders at our flagship AI event in San Francisco July 9-11. Network with your peers, explore the opportunities and challenges of generative AI, and learn how to integrate AI applications in your industry. Register now


The aforementioned API allows developers to build fully customized AI sound effects with short descriptions, and ElevenLabs charges 100 characters per generation with automatic duration, or 25 characters per second with a set duration.

In a quick test, the app for creating sound effects from video seemed simple: I imported a soundless movie of a vehicle driving through an all-terrain environment, and ElevenLabs' AI generated four options that sounded like a vehicle driving on a gravel road. Applying sound effects to clips is fun, but where the feature could really become useful is in integrating it into a larger system.

As the field of AI video generation heats up, ElevenLabs may be looking to stay one step ahead of the pack by developing new audio solutions that it knows will be in high demand from developers, filmmakers, and creatives.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *