Google announced its latest AI media creation models Veo and Imagen 3 at Google I/O 2024. Veo is designed to create high-quality 1080p videos, and Imagen 3 is the company's latest text-to-image framework. These models are meant to compete with OpenAI's Sora video model and Dall-E 3, both of which have become famous for AI-generated media.
Veo advanced features
According to Google, Veo has a sophisticated understanding of natural language and visual semantics, allowing it to create videos longer than one minute. AI can understand cinematic techniques such as time-lapse and simulate real-world physics. Veo can generate videos from text, images, and video prompts, allowing for a wide range of creative outputs. Google DeepMind CEO Demis Hassabis said additional prompts can be used to narrow down video results and enhance the creative process.
To demonstrate Veo's capabilities, Google partnered with Donald Glover and his creative studio Gilga. In his promotional videos, Glover and his team use his text prompts to generate scenes such as a convertible arriving at a house in Europe or a yacht gliding across the ocean. Glover emphasizes that storytelling is at the core of these tools, suggesting that with such technology, anyone can become a director.
The future of Veo in content creation
Google is considering additional features that would allow Veo to create storyboards and longer scenes. The company is inviting selected filmmakers and creators to experiment with the model and determine how it can best support creatives. Some Veo features will be available to selected creators in private previews within VideoFX. Google plans to add some of Veo's features to YouTube Shorts in the future.
Imagen 3 enhances text-to-image generation
Imagen 3, Google's latest text-to-image model, promises higher quality, more detailed, photorealistic images with fewer artifacts. Google claims that Imagen 3 handles text more efficiently than previous versions and can manage complex details from expanded prompts. The model is expected to be a strong competitor to OpenAI's Dall-E 3, which is well known for its AI-generated image capabilities.
Music AI Sandbox for Recording Artists
In addition to Veo and Imagen 3, Google introduced Music AI Sandbox. This is a set of tools aimed at recording artists, helping them create songs and beats. Artists like Wyclef Jean and Bjorn are working with Google to test these tools. The Music AI Sandbox has already shown some interesting demonstrations, but specific details are limited.
Google's new AI tools reflect the company's significant investments in AI technology as it aims to lead the next big advances in computing. Veo is currently available for some creators within Google's VideoFX tool, and will soon be integrated into YouTube Shorts and other products as well. Google has developed several video generation models over the past few years, including Phenaki, Imagen Video, and Lumiere.
Competition with OpenAI
OpenAI has already pitched its proprietary AI video generator Sora to Hollywood and plans to make it publicly available later this year. OpenAI has the potential to make Sora models available directly within video editing applications such as Adobe Premiere Pro. This competitive environment highlights the rapid evolution and growing importance of AI in media production.