Nano Banana Pro: how to write prompts the model actually understands
Google · Updated:
Nano Banana Pro is Google's flagship in Gemini 3 Pro Image: 4K, up to 14 references (6 high fidelity), full thinking mode, and SOTA text rendering. A thinking model — it understands intent, physics, and composition, reading prompts as a creative-director brief. English is the primary language; JSON structures work great.
What Nano Banana Pro does
Pro packs 10 headline capabilities at once: SOTA text rendering with infographics and multilingual typography, Identity Locking through 14 references (6 high fidelity), Google Search grounding, powerful mask-free editing, 2D→3D translation, native 4K generation, thinking with intermediate images, 9-10 frame storyboards, structural control via sketches and wireframes, and live data through search.
This is a separate class of model — for hero brand assets, posters, packaging, complex scenes with 4+ characters, and production-ready infographics. It's where tasks land when you need maximum control and high resolution, not fast iteration.
- Up to 4K, up to 14 references (6 high fidelity)
- SOTA text rendering: posters, packaging, infographics, multilingual typography
- Full thinking mode with intermediate «reasoning» images
- Google Search grounding for live data
- Structural control: sketches, wireframes, grids as input images
Prompt structure
Optimal order: [Subject with details] + [Scene/Setting] + [Lighting] + [Camera/Lens] + [Textures/Materials] + [Style/Mood] + [Purpose context] + [Format].
Golden rule — Creative Director, not Tag Soup. Pro is a thinking model, it understands intent. The prompt should sound like a brief for an artist. Concrete camera parameters significantly affect output: «Shot on Sony A7III with 85mm f/1.4 lens, classic three-point lighting setup, natural skin texture with visible pores, catchlights in eyes.» For complex scenes you can use JSON structure — Pro parses it well.
Brightness bias and how to fight it
Pro has a built-in bias toward bright, saturated, «polished» imagery. The model «fixes» overcast, adds saturation, pulls toward warm glow. Symptoms: you asked for an overcast mood and got morning light; asked for desaturated noir and got saturated editorial; asked for realistic candid and got polished commercial.
Counter-stack 2-3 simultaneously: explicit «overcast, muted desaturated palette, cool color temperature, no auto-brightening»; anti-glamour «no polished glamour, no commercial polish, raw documentary aesthetic»; color grade «dim ambient lighting, low contrast, faded vintage color grade, neutral exposure»; genre anchor «the look of news photojournalism» or «the look of a real police evidence photo.»
Multi-reference mode (10+ images)
With 8+ references loaded, the usual rules stop working — the prompt turns into an overloaded list and the model gets lost. Different rules apply.
Reference, don't redescribe: instead of «khaki jacket with leather collar and four pockets» — «jacket from @ref2.» Assign each image a role: «camera in the right hand (@ref3), bag over the shoulder (@ref4), compass on the belt (@ref5).» Each reference = a place in the scene. Drop refs without clear placement — 4 precisely placed references beat 14 vague ones. At 12+ references expect some details to be dropped — pre-select your 2-3 most important.
Common mistakes
1. Tag soup on a thinking model
«woman, paris, cafe, golden, 4k, realistic» wastes 60% of Pro's potential. The model is trained on full sentences and its thinking mechanism reads connected descriptions, not tags. Write like a creative director briefing a photographer — long descriptive sentences with grammar double quality on the same set of words.
2. Ignoring brightness bias
Pro defaults to bright, saturated, polished. For noir, documentary, candid, or horror this is a critical problem. Without an anti-glamour stack («overcast, muted, no polished glamour, raw documentary») the model will deliver «pretty» even when you asked for «truthful.» Stack 2-3 counter-moves at once.
3. Re-describing every reference with 10+ loaded
At 8+ references, detailed descriptions stop working — the prompt turns into an overloaded list. The right move: «jacket from @ref2», «camera in the right hand (@ref3).» Assign each image a role in the scene. Without clear placement a reference gets ignored or wrecks the composition.
4. Identity Locking forgotten on series
Without an explicit «keep facial features 100% identical to Image 1» the model changes face features frame to frame — even on the same model. For a 9-10 frame storyboard, Identity Locking repeats in every prompt, and clothing stabilization is mandatory («clothing and appearance remain stable across all frames»).
5. Regenerating instead of conversational editing
Pro remembers session context and supports in-dialog edits. If the image is 80% there, don't rewrite the prompt to change lighting or color. Say: «great, now change the light to sunset, keep the rest.» This preserves character identity and is faster than a new seed.
Before / after examples
Example 1
Before
viral youtube thumbnail
After
Viral YouTube thumbnail, 16:9 landscape. Left side: the man from Image 1 (keep facial features 100% identical to reference), surprised expression, mouth open, pointing dramatically at the right side of the frame. Right side: a sleek black gaming laptop on a glossy desk. Center-bottom: bold yellow outlined text «MIND BLOWN», sans-serif heavy weight, slight 3D depth. Color palette: high-contrast yellow, magenta, deep black background with subtle radial glow behind the laptop. Style: high click-through-rate YouTube thumbnail aesthetic.
Identity Locking, explicit purpose (viral thumbnail), exact text in quotes with font called out, color palette, format. Pro assembles a near-production-ready thumbnail in one pass.
Example 2
Before
overcast documentary shot
After
A weathered fisherman repairing a torn net on a wooden pier, late autumn morning. Overcast, muted desaturated palette, cool color temperature, dim ambient lighting, low contrast, faded vintage color grade, neutral exposure. No polished glamour, no commercial polish, raw documentary aesthetic. The look of news photojournalism — visible skin texture, weathered hands, worn jacket, no airbrushing. Shot on 35mm with mild film grain, slight desaturation.
Anti-brightness-bias stack: color grade + anti-glamour + genre anchor. Without it Pro will «fix» the overcast mood and serve warm morning instead of documentary grey.
Example 3
Before
financial report infographic
After
Pitch-deck slide titled "Annual Revenue Growth 2026". Visualize a 4-quarter bar chart with values: "Q1 $2.4M", "Q2 $3.1M", "Q3 $4.2M", "Q4 $5.8M". Use Inter bold sans-serif for labels, brand color #1A73E8 for bars, subtle gridlines, clean white background. Subtitle below chart in smaller weight: "45% YoY growth". Bottom-right: brand logo placeholder labeled "COMPANY". Style: modern editorial infographic, crisp 4K rendering, no decorative noise.
Pro is the only one in the family with production-ready dense text and number rendering. Font, hex color, and layout are explicit — the model assembles a near-final slide.