4 second image, conversational video editing

Google on Tuesday released two generative media models: Nano Banana 2 Lite for fast image generation and Gemini Omni Flash for conversational video editing. Developers will have instant access to a combined image-to-video pipeline for high-volume creative production at a commercially viable price for the first time. Nano Banana 2 Lite costs $0.034 per image and produces text-to-image output in 4 seconds. Gemini Omni Flash generates and edits video through natural language conversations at $0.10 per second of output. As of June 30, 2026, both are available in Google AI Studio and Gemini API.

Dual launch is the most important pipeline. Developers can pass images generated by Nano Banana 2 Lite directly to Gemini Omni Flash for animation, and then use Google’s Interactions API to continue fine-tuning the results through plain language commands, such as adjusting camera angles, swapping characters, and relighting scenes, for up to three consecutive edits within a single session. This chain is unlike anything previous AI media stacks have offered at this price point. A fast image generator and a stateful conversational video editor combined into one workflow.

Nano Banana 2 Lite: Built for volume, not craft.

Nano Banana 2 Lite, model identifier gemini-3.1-flash-lite-image, is the fastest and lowest-cost model in Google’s 4-layer Nano Banana image family. Google is positioning this as a direct upgrade from the original Nano Banana (gemini-2.5-flash-image), which is now the legacy tier of the family. Google says the new model is built for “rapid idea generation and fast developer pipelines, where speed and cost are the primary constraints.”

The 4 second delay changes the calculation of the category. Previous image generators operated on timescales that were outside of the interactive loop. Developers had to wait, batch results, and adjust to test large numbers of prompt variations. Image generation takes 4 seconds, making it fast enough to embed within live design tools, e-commerce configurators, or consumer-facing features where users await results. Logan Kilpatrick, head of Google AI Studio and Gemini API, said the effect feels like “magic.” When idea generation is faster than creation, creators stay in the process rather than interrupting the flow and waiting for a progress bar.

Google says that despite its focus on speed, Nano Banana 2 Lite maintains reliable instant compliance, consistent character rendering across multiple generations, and easy-to-read text in images – three features most important for advertising and marketing use cases. Idan Yonas, director of AI content and innovation at Artlist, explained that this model enables a creative process where “thoughts move almost instantly to visuals.” Itay Schiff, co-founder and creative director of Figma, says Nano Banana 2 Lite is “ideal for rapid iteration while maintaining creative flow.”

This model ranks 5th on Arena’s public image generation leaderboard. OpenAI’s gpt-image-2 leads the ranking. Microsoft’s MAI-Image-2.5, announced in May, is in fourth place. The Nano Banana family currently extends to: Nano Banana 2 Lite (speed optimized). Nano Banana 2, general purpose option. Nano Banana Pro is designed for complex professional use cases where accuracy trumps speed.

Gemini Omni Flash: Why conversational editing is changing your workflow

All major AI video tools released before Gemini Omni Flash worked on a generation and export paradigm. This means that when a user submits a prompt, the model renders the clip, and if the clip requires changes, the user either recreates the prompt from scratch or moves to another editing application. This paradigm makes it expensive to actually iterate on AI-generated video, regardless of the price per second.

Gemini Omni Flash (gemini-omni-flash-preview) breaks that pattern with a combination of architecture and API design. This model is built on Gemini’s multimodal inference engine. Instead of stitching together separate image, audio, and video pipelines, infer all input types simultaneously and produce a unified output. Google DeepMind Product Management Director Nicole Brichtova described it as “the next step in the advancement of combining the intelligence of Gemini with the rendering capabilities of media models.” This is clearly not an update to Veo, but a new model that combines inference and rendering into one system.

The actual result is an Interactions API that maintains session history across consecutive edits. Developers can generate a 10-second video clip from an image reference, ask the model to adjust lighting and re-render, and then ask the model to replace background elements. All this can be done within one session, and the model retains the context of each previous turn. The current implementation is limited to three consecutive edits per session.

Gemini also brings knowledge of the world into the rendering process. The model leverages Gemini’s training in history, biology, narrative logic, and physics, including approximate behavior of gravity and fluid mechanics, to construct scenes that match real-world expectations rather than producing seemingly plausible but physically inconsistent motion.

Gemini Omni Flash costs $0.10 per second of video output, comparable to Google’s Veo 3.1 Fast. Google makes a clear distinction between the two products. Veo 3.1 excels at producing high-quality one-shot clips. Gemini Omni Flash is designed for iterative, conversational workflows that combine multiple asset types.

The competitive landscape is noteworthy. Announced on June 23, 2026, ByteDance’s Seedance 2.5 supports up to 30 second clips, 4K output, and up to 50 reference inputs simultaneously. Gemini Omni Flash currently has a 10 second clip limit. Google describes the limit as a deployment decision rather than a model constraint, a way to expand access during times of high computing demand, and says the period will be longer. A more capable Gemini Omni Pro model is planned, but no release date has been set.

What you can’t do yet with Gemini Omni Flash

Google has been transparent about Gemini Omni Flash’s current limitations in its launch documents. Uploading audio references is not yet supported in the Gemini API. Video references up to 3 seconds long are accepted by the API schema, but are currently not handled correctly by the model. Character consistency across scene changes and panning movements documents the gaps. Google recommends treating the current release as a prototyping tool for developers rather than a production-ready service.

The model also refuses to generate or edit videos that include the name or likeness of a real person. When such a request is sent, the model returns an input blocked message. This filter is consistent with Google’s Responsible AI Principles and limits the risk of deepfakes, but it also excludes certain legitimate creative applications, such as historical reconstruction involving named individuals.

WPP, Adobe and Invideo merge on launch day

Introduction to companies has already begun. WPP is integrating Gemini Omni Flash into its WPP Open agent platform to deliver more controlled AI content creation at scale for clients, with teams testing asset localization, product exchange, and dynamic style transfer. Adobe has announced plans to incorporate both Nano Banana 2 Lite and Gemini Omni Flash into Adobe Firefly. Matt Chotin, Adobe’s senior director of products, said the two models “build on Adobe’s strategy of delivering pro-grade tools and industry-leading creative AI models in connected workflows that give creators flexibility and control over how they bring their creative ideas to life.”

Invideo, an AI video platform, reports that Gemini Omni Flash’s visual effects capabilities open up the possibility of combining traditional filmmaking techniques and AI-generated effects on the same production.

Both models feature SynthID watermarking and support for C2PA content credentials, allowing you to authenticate AI-generated media and track its origins through the Gemini app, Gemini in Chrome, or Google Search.

Wider context: AI slop and enterprise pivots

The announcement comes as the generative AI image and video market faces a growing backlash against quality. A June 2026 study found that approximately 60 percent of TikTok videos are classified as AI-generated content. The term “AI slop” has become a household term to describe the machine-made media that floods social platforms. Google has consistently framed Nano Banana 2 Lite and related tools as being aimed at advertising and corporate use, rather than consumer creativity. This is a strategic positioning that avoids some, if not all, backlash.

Separately, Google’s recent $75 million partnership with indie studio A24 has drawn criticism from the creative community concerned about AI encroaching on professional filmmaking. The deal sparked a huge backlash from fans online.

For developers evaluating which models belong in the production pipeline, the clearest guide is Google’s own distinction. Nano Banana 2 Lite is a mass ideation tool built for speed over technology. Gemini Omni Flash is a conversational iteration tool that is still in public preview. Both require no waiting list for standard developer access and are available immediately at listed prices.

FAQ

What is Nano Banana 2 Lite and how fast is it?

Nano Banana 2 Lite (gemini-3.1-flash-lite-image) is Google’s fastest and lowest-cost image generation model, capable of producing text-to-image output in approximately 4 seconds for $0.034 per image. It’s part of Google’s 4-tier Nano Banana family, designed for high-capacity, latency-sensitive developer pipelines. It is available in Google AI Studio, Gemini API, Gemini Enterprise Agent Platform, and in consumer surfaces such as AI mode in search, Gemini apps, NotebookLM, Google Photos, and Google Ads.

How is Gemini Omni Flash different from other AI video generators?

Most AI video tools generate clips that require you to re-prompt from the beginning if you want to make changes. Gemini Omni Flash uses Gemini’s multimodal inference engine and interaction API to support stateful, conversational, and multiturn editing. Users can describe their changes in plain language, and the model applies them while preserving the context of previous edits. This moves AI video from a one-shot generation tool to an iterative creative workflow. Current limitations include a 10 second clip limit, no audio reference uploads in the API, and ongoing character consistency issues between scene changes.

Can Nano Banana 2 Lite and Gemini Omni Flash be used together?

Yes, this is Google’s intended use case. Developers can generate images with Nano Banana 2 Lite and pass them as references directly to Gemini Omni Flash to animate and create videos. The Interaction API supports up to three consecutive conversation edits within one session. Google has released three demo applications that demonstrate the combined pipeline. Anywhere (places and animates user photos in landmark locations), Space Lift (redesigns of room interiors previewed as cinematic videos), and Omni Product Studio (static product images converted into e-commerce videos).

What are the actual engineering trade-offs behind the 4 second image generation speed?

Nano Banana 2 Lite achieves 4 seconds of latency by optimizing throughput over fidelity. This is clearly a speed-first model, not a quality-first model. Google says the model maintains reliable prompt compliance, character consistency, and readable text in images despite optimizations, but Nano Banana 2 and Nano Banana Pro remain the recommended options for use cases where visual quality or complex expert reasoning is a priority. Speed improvements reflect intentional quality and speed trade-offs, rather than free improvements.

Source link