Gemini Omni Flash adds multimodal AI video creation to Google ecosystem

4 minute readUpdated: May 20, 2026 06:11 PM (IST)

Google announced Gemini Omni, a new multimodal AI model designed to generate and edit videos using a combination of text, images, audio, and video prompts. The announcement was made during Google I/O 2026, and the company described Omni as a major step toward turning Gemini into a fully creative AI system that can understand and generate multiple forms of media.

The first version of this model, called Gemini Omni Flash, is currently live through the Gemini app, Google Flow, and YouTube Shorts. Google says the model combines Gemini’s reasoning power with AI-powered content generation, allowing users to create cinematic-quality videos using natural language prompts.

AI video editing using conversation

One of Gemini Omni’s biggest features is conversational video editing. Rather than using traditional editing tools and timelines, users simply describe what they want to do in simple terms.

Google showed examples where users could turn sculptures into bubbles, mirrors into liquid, apply animations, and change environments without changing the characters or realistic physics in the video clip. The company says each prompt builds on previous edits, allowing users to adjust videos across multiple prompts without losing continuity.

According to Google, this model has a deeper understanding of motion, lighting, gravity, fluid dynamics, and object interactions, which helps produce scenes that look more realistic and physically accurate.

Gemini Omni combines text, images, video and audio

Google says Gemini Omni can process multiple types of input simultaneously. Users can upload photos, existing videos, drawings, audio references, and text prompts to create a single, cohesive output.

For example, users can apply the visual style of a single image to a video, sync visuals to music, or generate cinematic clips based on rough sketches or written instructions. The system can also create educational explanations and animation sequences from short prompts.

Story continues below this ad

The company says Omni is designed to bridge the gap between AI-generated visuals and meaningful storytelling by combining creative generation with Gemini’s extensive knowledge of science, history, and culture.

Create AI avatars and personalized content

Google is also introducing AI avatars as part of Gemini Omni. Users can create a digital version of themselves using their appearance and voice to generate personalized videos.

The company says it is approaching these features with caution due to concerns about deepfakes and abuse. At this time, voice-based avatar generation will be the first to start, but additional editing features, including voice and voice manipulation, are still being tested.

All videos generated through Gemini Omni include Google’s invisible SynthID watermarking technology, which lets viewers know the content is generated by AI.

Story continues below this ad

Deployed on Gemini and YouTube

Gemini Omni Flash will be released globally to Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow. Google is also bringing this technology to its YouTube Shorts and YouTube Create apps at no additional cost to creators.

The company says it will provide developer and enterprise API access in the coming weeks, allowing businesses and creators to integrate Gemini Omni into their own tools and workflows.

Source link