Nano Banana’s Ultimate Reminder Guide

Machine Learning


Creating accurate, high-quality images often requires endless trial and error. You need a model that actually understands what you’re looking for.

Built on the Gemini 3 family of models, the Nano Banana model applies deep reasoning capabilities to fully understand the prompt before generating the image. So we spent several weeks testing the Nano Banana 2 and Nano Banana Pro for every use case imaginable, testing their limits.

This guide has been put together to share what we’ve learned and exactly how to get the best results.

What you’ll learn in this guide:

  1. Model overview

  2. Complete breakdown of technical specifications

  3. Best practices for effective prompts

  4. prompt framework

  5. How Nano Banana works with other creative models, Veo and Lyria.

model overview

The Nano Banana model is an advanced image generation and editing model that uses real-world knowledge and deep inference capabilities to deliver accurate and rich visual results. Most recently, We announced Nano Banana 2shine in three ways.

  1. More accurate visuals: Nano Banana 2 uses real-time information and images from web searches. This means better educational tools, localized marketing, travel apps, and more.

  2. Fast, pro-level features: We’ve unlocked premium features, from text rendering and translation to 2K/4K upscaling. Creative teams can now create cohesive narratives, storyboards, and product mockups.

  3. Precision control: Generate or edit images to suit any project requirement, with native support for 16:9, 9:16, 2:1, and more. Whether you’re producing posters, marketing mockups, or advertisements, you can expect vibrant lighting and richer textures.

Breakdown of Nano Banana 2 and Nano Banana Pro technical specifications

Before we get into the prompts, here’s a breakdown of what your model can handle via the API and Vertex AI (always check the official for the latest details) Gemini 3 Pro Images and Gemini 3.1 Flash Image document):

  • Context window: The Gemini 3.1 Flash image (Nano Banana 2) supports up to 131,072 input tokens, and the Gemini 3 Pro image (Nano Banana Pro) supports up to 65,536 input tokens. Both models support up to 32,768 output tokens.

  • Solution: Built-in ability to generate 1K, 2K, and 4K visuals. Gemini 3.1 Flash images add a smaller 512 pixel (0.5K) resolution.

  • Aspect ratio: Both models support 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9. Gemini 3.1 Flash Image Preview also adds 1:4, 4:1, 1:8, and 8:1 aspect ratios.

  • Image input: You can mix up to 14 reference object images in one prompt. Supported MIME types include image/png, image/jpeg, image/webp, image/heic, and image/heif.

  • Document input: You can input text files and PDF files. Maximum file size per file is 50 MB for API and Cloud Storage imports and 7 MB for direct uploads through the Google Cloud console.

  • output: Both models output text and images.

  • Model knowledge base: The knowledge cutoff date for both models is January 2025.

  • live data: Both models utilize real-time information from web searches.

  • Trust and safety: All images generated include: C2PA Content credentials and SynthID watermark.

To see examples of key features, check this blog.

Best practices for effective prompts

When it comes to effective prompts, there are several ways to ensure that the visuals you get are what you asked for. Here are some guidelines:

  1. Please be specific: Provide specific details about subject matter, lighting, and composition.

  2. Use positive frames. Explain what you want, not what you don’t want (e.g. “empty streets” instead of “no cars”).

  3. Control the camera. Use photography and film terms such as “low angle” and “aerial photography.”

  4. Iteration: Adjust images with conversational follow-up prompts.

The key is to start the prompt with a strong verb that tells the model the main operation you want to perform.

Five facilitation frameworks

1. Image generation

When generating images, the structure of the prompt depends entirely on whether you use a reference image or rely solely on text.

Text to image generation without it References

When you start with a blank canvas, you are the director. A simple list of keywords is not enough. You need to describe the scene narratively.

formula: [Subject] + [Action] + [Location/context] + [Composition] + [Style]

Example prompt: [Subject] Striking fashion model in a brown tailored dress, sophisticated boots and carrying a structured handbag. [Action] Pose with a confident and dignified posture, with a slight turn. [Location/context] Seamless deep cherry red studio background. [Composition] Medium full shot, center frame. [Style] Fashion magazine style editorial, shot on medium format analog film, pronounced grain, high color saturation, cinematic lighting effects.



Source link