Google has started rolling out Gemini Omni Flash, a new multimodal AI model that can generate and edit videos using text, image, audio, and video inputs. This development follows the model’s announcement at Google I/O 2026 and marks the point at which users will be able to actively use the system within the Gemini app, Google Flow, and YouTube Shorts.
The company says the model is designed to combine inference and creative generation in a single system, allowing users to build and modify video content through natural conversation.
Gemini Omni Flash allows users to instruct models to create videos from scratch or make incremental changes to existing clips. Each instruction builds on the previous one, allowing you to continually improve your scene without breaking continuity. Google says this allows characters, objects, and environments to remain consistent between edits, even if the video is changed over many iterations.
This model also supports multi-input workflows, allowing users to combine different types of inputs such as text prompts, images, video clips, and audio references. This allows you to use multiple reference points to shape a single output video, rather than relying on a single prompt. Google says the system is built to understand how these inputs relate to each other and produce a consistent final scene.
The rollout is part of Google’s broader efforts to integrate generative AI into its consumer ecosystem, particularly its platform focused on creating short-form videos. YouTube Shorts and the YouTube Create app were among the first platforms to introduce Omni Flash functionality, demonstrating stronger collaboration between AI generation tools and content creation pipelines.
The company also says that all output generated through the system will include a SynthID watermark to identify AI-generated content.
conversational video editing
Gemini Omni Flash allows users to edit videos using natural language commands instead of traditional editing tools. Users can write changes such as changing the environment, adding objects, changing actions in the scene, etc., and the model updates the video accordingly while maintaining the overall structure.
This system is designed to maintain visual continuity throughout an edit, ensuring that characters and objects remain consistent even when changes are made across multiple steps. Google says this makes the editing process more iterative and flexible compared to traditional video production tools.
The model also leverages Gemini’s broader world knowledge to improve the realism of the generated content. Google says it uses this understanding to more accurately simulate physical interactions such as movement, lighting, and environmental effects.
From prompt to production
Google is positioning Gemini Omni Flash as part of a broader shift toward multimodal AI systems that can handle creation and reasoning together. This model is designed to handle multiple input formats and produce output videos that reflect combined instructions rather than individual prompts.
The company says the goal is to reduce the gap between idea and execution, allowing users to go from concept to finished video using a single conversational interface. Google plans to expand output formats beyond video in the future, with image and audio support also planned for future updates.
Deployment of Gemini Omni Flash is currently limited to certain subscription tiers of the Gemini app, but broader access is expected as deployment expands.
