Eleven Labs has done it again. The pioneers of the highest quality AI-generated voice and SFX audio have announced a new text-to-sound effects API.
To mark the occasion, the company has also released a very cool open source demo called “Video to Sound Effects” that shows off the power of this technology. It is available online at GitHubthat's pretty awesome.
All you have to do is upload the generated video to ElevenLabs’ demo webpage and wait while the platform analyzes the video and returns a choice of four different sound effect audio tracks.
Choose the version you like, press the download button and get your video clip with your new audio. It's very easy, after uploading a 5-second clip, the whole process takes about 5 minutes.
This is a new field of AI known as video-to-audio (V2A), and Google recently announced a research project promising similar technology, though it's not available to try just yet.
Test ElevenLabs
I tested using Luna Dream Machine (LDM) as a video generation tool. I tried five different video prompts with mixed results. Well, it's still early days. Anyway, I eventually managed to get a clip of a gorilla riding a Harley Davidson motorcycle and uploaded it to the ElevenLabs demo page.
The company is targeting the technology not only for sound effects, but also for on-demand samples for music production and dynamic sounds for video games.
I had about 20 seconds to listen to the four audio samples, then selected one to begin the download process. It took some trial and error, but the end result was pretty amazing – the video was entertaining and the audio added a whole new dimension.
The technology samples four frames, spaced one second apart, from an uploaded video and sends them to ChatGPT-4o to create a custom text-to-sound effect prompt.
The prompts are then sent back to the ElevenLabs API to create the final SFX, which is crude but surprisingly effective. The result will never win an Oscar or a Golden Reel award, but it works well as a quick way to liven up boring AI-generated video clips.
We're excited to introduce our text-to-sound effects API. To showcase it, we've built the first video-to-sound effects app, which is freely available online and fully open source. pic.twitter.com/8aalo8GCSoJune 17, 2024
While the demo is clearly aimed at casual users, the new API is aimed at serious business use.
The company is targeting the technology not only for sound effects, but also for on-demand samples for music production and dynamic sounds for video games.
To deploy the API, customers need an ElevenLabs account with an API key, and it costs 100 characters per generation, or 25 characters per second, per set period.
