Image

Nano Banana Pro: how to write prompts the model actually understands

Google · Updated:

Nano Banana Pro is Google's flagship in Gemini 3 Pro Image: 4K, up to 14 references (6 high fidelity), full thinking mode, and SOTA text rendering. A thinking model — it understands intent, physics, and composition, reading prompts as a creative-director brief. English is the primary language; JSON structures work great.

What Nano Banana Pro does

Pro packs 10 headline capabilities at once: SOTA text rendering with infographics and multilingual typography, Identity Locking through 14 references (6 high fidelity), Google Search grounding, powerful mask-free editing, 2D→3D translation, native 4K generation, thinking with intermediate images, 9-10 frame storyboards, structural control via sketches and wireframes, and live data through search.

This is a separate class of model — for hero brand assets, posters, packaging, complex scenes with 4+ characters, and production-ready infographics. It's where tasks land when you need maximum control and high resolution, not fast iteration.

  • Up to 4K, up to 14 references (6 high fidelity)
  • SOTA text rendering: posters, packaging, infographics, multilingual typography
  • Full thinking mode with intermediate «reasoning» images
  • Google Search grounding for live data
  • Structural control: sketches, wireframes, grids as input images

Prompt structure

Optimal order: [Subject with details] + [Scene/Setting] + [Lighting] + [Camera/Lens] + [Textures/Materials] + [Style/Mood] + [Purpose context] + [Format].

Golden rule — Creative Director, not Tag Soup. Pro is a thinking model, it understands intent. The prompt should sound like a brief for an artist. Concrete camera parameters significantly affect output: «Shot on Sony A7III with 85mm f/1.4 lens, classic three-point lighting setup, natural skin texture with visible pores, catchlights in eyes.» For complex scenes you can use JSON structure — Pro parses it well.

Brightness bias and how to fight it

Pro has a built-in bias toward bright, saturated, «polished» imagery. The model «fixes» overcast, adds saturation, pulls toward warm glow. Symptoms: you asked for an overcast mood and got morning light; asked for desaturated noir and got saturated editorial; asked for realistic candid and got polished commercial.

Counter-stack 2-3 simultaneously: explicit «overcast, muted desaturated palette, cool color temperature, no auto-brightening»; anti-glamour «no polished glamour, no commercial polish, raw documentary aesthetic»; color grade «dim ambient lighting, low contrast, faded vintage color grade, neutral exposure»; genre anchor «the look of news photojournalism» or «the look of a real police evidence photo.»

Multi-reference mode (10+ images)

With 8+ references loaded, the usual rules stop working — the prompt turns into an overloaded list and the model gets lost. Different rules apply.

Reference, don't redescribe: instead of «khaki jacket with leather collar and four pockets» — «jacket from @ref2.» Assign each image a role: «camera in the right hand (@ref3), bag over the shoulder (@ref4), compass on the belt (@ref5).» Each reference = a place in the scene. Drop refs without clear placement — 4 precisely placed references beat 14 vague ones. At 12+ references expect some details to be dropped — pre-select your 2-3 most important.

Common mistakes

  1. 1. Tag soup on a thinking model

    «woman, paris, cafe, golden, 4k, realistic» wastes 60% of Pro's potential. The model is trained on full sentences and its thinking mechanism reads connected descriptions, not tags. Write like a creative director briefing a photographer — long descriptive sentences with grammar double quality on the same set of words.

  2. 2. Ignoring brightness bias

    Pro defaults to bright, saturated, polished. For noir, documentary, candid, or horror this is a critical problem. Without an anti-glamour stack («overcast, muted, no polished glamour, raw documentary») the model will deliver «pretty» even when you asked for «truthful.» Stack 2-3 counter-moves at once.

  3. 3. Re-describing every reference with 10+ loaded

    At 8+ references, detailed descriptions stop working — the prompt turns into an overloaded list. The right move: «jacket from @ref2», «camera in the right hand (@ref3).» Assign each image a role in the scene. Without clear placement a reference gets ignored or wrecks the composition.

  4. 4. Identity Locking forgotten on series

    Without an explicit «keep facial features 100% identical to Image 1» the model changes face features frame to frame — even on the same model. For a 9-10 frame storyboard, Identity Locking repeats in every prompt, and clothing stabilization is mandatory («clothing and appearance remain stable across all frames»).

  5. 5. Regenerating instead of conversational editing

    Pro remembers session context and supports in-dialog edits. If the image is 80% there, don't rewrite the prompt to change lighting or color. Say: «great, now change the light to sunset, keep the rest.» This preserves character identity and is faster than a new seed.

Before / after examples

Example 1

Before

viral youtube thumbnail

After

Viral YouTube thumbnail, 16:9 landscape. Left side: the man from Image 1 (keep facial features 100% identical to reference), surprised expression, mouth open, pointing dramatically at the right side of the frame. Right side: a sleek black gaming laptop on a glossy desk. Center-bottom: bold yellow outlined text «MIND BLOWN», sans-serif heavy weight, slight 3D depth. Color palette: high-contrast yellow, magenta, deep black background with subtle radial glow behind the laptop. Style: high click-through-rate YouTube thumbnail aesthetic.

Identity Locking, explicit purpose (viral thumbnail), exact text in quotes with font called out, color palette, format. Pro assembles a near-production-ready thumbnail in one pass.

Example 2

Before

overcast documentary shot

After

A weathered fisherman repairing a torn net on a wooden pier, late autumn morning. Overcast, muted desaturated palette, cool color temperature, dim ambient lighting, low contrast, faded vintage color grade, neutral exposure. No polished glamour, no commercial polish, raw documentary aesthetic. The look of news photojournalism — visible skin texture, weathered hands, worn jacket, no airbrushing. Shot on 35mm with mild film grain, slight desaturation.

Anti-brightness-bias stack: color grade + anti-glamour + genre anchor. Without it Pro will «fix» the overcast mood and serve warm morning instead of documentary grey.

Example 3

Before

financial report infographic

After

Pitch-deck slide titled "Annual Revenue Growth 2026". Visualize a 4-quarter bar chart with values: "Q1 $2.4M", "Q2 $3.1M", "Q3 $4.2M", "Q4 $5.8M". Use Inter bold sans-serif for labels, brand color #1A73E8 for bars, subtle gridlines, clean white background. Subtitle below chart in smaller weight: "45% YoY growth". Bottom-right: brand logo placeholder labeled "COMPANY". Style: modern editorial infographic, crisp 4K rendering, no decorative noise.

Pro is the only one in the family with production-ready dense text and number rendering. Font, hex color, and layout are explicit — the model assembles a near-final slide.

Frequently asked

When Pro, when Nano Banana 2?
Pro — 4+ characters in frame, hero brand assets, dense text on posters/packaging, infographics, ray-traced lighting, 9-10 frame storyboards, 10+ references, structural sketch control. NB2 — single portrait close-ups (less uncanny valley), macro skin, selfies, candid documentary, extreme aspect ratios, mass iteration (10 variants before finalizing). It's not «higher = better» — they're different tools.
What is thinking mode and how does it affect output?
Pro «thinks» by default — generating intermediate «reasoning» images to refine composition before the final render. This lets it solve visual problems, do «before/after» reasoning, use Google Search, and self-check. Generation takes longer, but first-pass quality is significantly higher — especially on complex scenes with infographics and many elements.
Do JSON prompts work?
Yes, Pro parses JSON structures well — it's one of the recommended approaches for complex scenes with many details. Structure: subject (description, expression, clothing), photography (camera_style, lighting, lens), background (setting, elements, atmosphere). This technique gives maximum control and is convenient for templating in production pipelines.
How do I get film realism without the glossy AI look?
Stack counter-moves against brightness bias: «overcast, muted desaturated palette, cool color temperature, dim ambient lighting, low contrast, faded vintage color grade, neutral exposure» + «no polished glamour, no commercial polish, raw documentary aesthetic» + a genre anchor «the look of news photojournalism.» Describe real imperfections: «visible pores, weathered skin, fabric wear, subtle film grain.» 2-3 counter-moves are required.
How many references actually make sense to load?
Pro holds up to 14 (6 high fidelity), but at 12+ expect some details to be dropped — the model can't fit it all. Optimal is 4-8 precisely placed references with an explicit role each («jacket from @ref2», «background from @ref7»). 14 vague refs are worse than 4 precise. Pre-select 2-3 most important and make sure they're described unambiguously — the rest become bonus, not base.
Is multilingual in-image text supported?
Yes, Pro is SOTA in the family for text rendering: Latin, Cyrillic, CJK (Chinese, Japanese, Korean), Arabic, Hindi, Bengali. Exact text always in quotes with font, size, color, and placement called out. For long copy add «EXACT» or «verbatim»: «label text (EXACT): "Fresh and clean".» For posters and packaging, only Pro in the family delivers a production-ready result.
Does Opten support Nano Banana Pro?
Yes, the Opten extension auto-detects Nano Banana Pro inside Google AI Studio and Gemini 3 Pro and scores prompts against the structure above: it checks for brief style instead of tag soup, Identity Locking on references, the anti-brightness-bias stack for documentary and noir, exact text in quotes, and role assignment on multi-reference prompts. One click gives you a rewrite that won't drift into commercial gloss.

Related models

Ready to write Nano Banana Pro prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672