google announced Gemini Omni, a new multimodal AI model designed for creating and editing videos using a combination of text, image, video, and audio inputs.
Gemini Omni Flash, the first model in the Omni family, is rolled out to Gemini apps, Google Flow, and YouTube Shorts.
The company says this model allows users to edit videos through conversational prompts, with each instruction building on previous edits while maintaining continuity between scenes, characters, and visual elements.
The tool also supports video generation using input references in multiple formats, including images, drawings, video clips, and audio input. Google said support for additional voice input formats will be added in the future.
The company says Gemini Omni combines Gemini’s reasoning capabilities with video generation capabilities such as scene creation based on concepts such as physics, historical context, and visual consistency.
Google also introduced an avatar feature that allows users to create videos using digital versions of themselves and their own voices.
The company says all videos generated using Gemini Omni will include a SynthID watermark. Google said the generated content can also be verified through the Gemini app, Gemini in Chrome, and Google Search.
This release expands on Google’s broader Gemini ecosystem, which previously included image generation and editing tools such as Nano Banana.
