At Google I/O 2024, Google announced two new generative media models: Veo and Imagen 3. Veo is built to produce high-resolution video, while Imagen 3 is a text-to-image model.
Google has been steadily updating Imagen 2 through the first half of 2024. Recently added the ability to create “live photos” and is considered to be one of the best AI image generators.
Veo is Google's latest generative media model, specifically aimed at producing 1080p video. Google says Veo can create videos longer than a minute, but doesn't say how long it can be.
Veo is thought to be able to understand film terms such as time-lapse and “aerial landscape photography.” The tech giant showed off some of it in a collaborative video with Donald Glover.
Veo is available to some users within VideoFX and there is a waiting list if you are interested.
According to Google, the latest Imagen 3 has been upgraded to better understand “natural language, the intent behind prompts, and the finer details of long prompts.”
The company also claims that this is the best model so far for rendering text, which is an ongoing problem with most image-generating AI models. If true, gone are the days of weird misspellings or Lorem Ipsum-style “words” appearing in images.
Imagen 3 is available to some creators as a preview of ImageFX. For those interested, Google has a waiting list. Although Google hasn't said when, it looks like Imagen 3 will be available on Vertex AI soon.
Google briefly mentioned in its press release that it will be introducing a suite of AI tools called Music AI Sandbox. These tools are intended to allow users to create new instrument sections and transform sounds. Google didn't elaborate on the specifics of its AI tools, other than mentioning its partnership with Wyclef Jean, Marc Revillet, and Justin Tranter.