Nano Banana 2: how to write prompts the model actually understands
Google · Updated:
Nano Banana 2 is Google's second-generation image model in the Gemini API, with up to 2K resolution, basic thinking mode, and support for up to 6 reference images. It reads prompts as a designer brief in natural language — not tag soup. English is the primary language; conversational editing is supported.
What Nano Banana 2 does well
The model is tuned toward Google's commercially bright aesthetic: warm palette, saturated colors, clean composition. It excels at portrait close-ups — noticeably less uncanny valley than Pro and more natural skin texture with pores and micro-imperfections. Film stocks (Kodak Portra 400, 35mm), retro eras (1990s, 2000s), selfies, and social-media content are all sweet spots.
For light editing there's a conversational mode — in-session tweaks are normal. Basic text rendering works for short labels (1-3 words); dense text and infographics are better handed off to Pro or GPT Image 2.
- Up to 2K, up to 6 reference images per request
- Natural language, full sentences, brief instead of tags
- Basic thinking mode + basic Google Search grounding
- Strengths: portraits, selfies, film realism, candid documentary
- 3-5x faster and ~25% the cost of Pro — the iteration model
Prompt structure
Optimal order: [Subject with details] + [Scene/Setting] + [Lighting/Mood] + [Style] + [Camera (optional)] + [Format].
The main rule is subject specificity. Instead of «a girl on the street» — «a young woman with short red hair in a denim jacket stands at a Tokyo intersection in the evening, street lamps casting warm reflections on wet asphalt.» A prompt under 10 words gets filled in by the model's statistical mean, and the result is generic.
Purpose context (what this image is for — album cover, avatar, ad banner) helps the model make style decisions automatically. This is a Google-family trait — it thinks about the goal, not just the visuals.
Natural language, not tag soup
Tag soup in the Midjourney style («woman, paris, cafe, golden, 4k, realistic») critically degrades output. Nano Banana 2 was trained on descriptive text and expects connected sentences. Same logic as gpt-image-2 and the Pro version — write like a creative director briefing a photographer.
Parameters like `--ar 16:9`, `::weight`, `(keyword:1.2)`, BREAK don't work and end up as literal noise in the prompt. Set format with words («16:9», «portrait», «square»), weight ideas by order (important first), styles via normal adjectives.
Editing and over-edit
Conversational editing is the primary mode for tweaks. If an image is 80% there, don't regenerate: «great, now switch the background to sunset and make the jacket dark blue.» The model remembers session context.
Known weakness — over-edit with loaded references: NB2 tends to «improve» the reference even when asked to keep it as-is. The fix is an explicit preserve block: «keep identity 100% — do not stylize, no enhancement, do not airbrush skin.» For portraits with real skin this is critical — without the block the model smooths pores into a glossy face.
Common mistakes
1. Tag soup instead of sentences
«woman, paris, cafe, golden, 4k, realistic» is legacy Midjourney/SD syntax. Nano Banana 2 was trained on descriptive text and produces generic, unfocused results from tag soup. Write connected sentences as a brief for a photographer — that alone doubles quality on the same set of words.
2. Prompt that's too short (<10 words)
«Girl in a café» — the model fills in everything else by the statistical average of training data. You'll get a generic blonde in a generic Starbucks with a generic latte. A minimum working prompt is subject with details + scene + style. That's ~25-40 words as a starting point.
3. Describing the image during I2I editing
In conversational edits, don't restate what's in the picture. The model already sees it. A prompt like «in the photo there's a girl in a café, change the background» wastes tokens — just say «change only the background to...» Restating content sometimes conflicts with what the model has already parsed from the reference.
4. No preserve block on edits
NB2 over-edits references — it «improves» beyond the request. «Change the background» without «preserve: face, identity, skin texture» usually changes the skin too, smoothing pores into gloss. Every edit should end with an explicit preserve list — this is a known model weakness, not a prompt failure.
5. Using NB2 for dense text and complex scenes
Posters with long copy, infographics, packaging, 4+ characters in frame — that's Pro or GPT Image 2 territory. NB2 handles short labels (1-3 words) and single portraits. On heavy scenes it confuses identities and breaks dense text — that's an architectural ceiling of the version, not a prompt fix.
Before / after examples
Example 1
Before
pretty girl in a café
After
A young woman in her late twenties with short auburn hair and freckles, wearing a worn olive-green linen shirt, sitting at a small marble café table in Lisbon, sipping espresso. Soft afternoon light spills through the window, warm muted tones, shallow depth of field. Shot on 35mm film, Kodak Portra 400, natural grain, visible pores, no airbrush look. Editorial documentary style.
Concrete subject, concrete location, concrete lighting and a film stack. The «no airbrush» line is an anti-glamour stop signal — without it NB2 will smooth the skin.
Example 2
Before
make the background darker
After
Keep the subject and pose exactly as is. Change only the background: from the bright café window to a moody, dim interior with warm tungsten lamp light in the far corner. Preserve: face, identity, skin texture, hair, clothing, camera angle, framing. No re-styling of the person, no airbrushing, no over-saturation.
A conversational edit with an explicit preserve block. Without it NB2 «improves» the face too, smoothing skin texture — that's its textbook over-edit.
Example 3
Before
stylish social media avatar
After
Square 1:1 social media avatar for an indie illustrator. Mid-shot of a young man with curly black hair and round tortoise-shell glasses, wearing a mustard knit sweater, faint smile, natural relaxed posture. Soft north-window light, neutral grey background, warm color grade. Style: editorial portrait with subtle 2000s digital camera feel, natural skin texture with visible pores, no glamour retouch.
Purpose («social media avatar for an indie illustrator») activates the right mode in NB2. Format, camera era, and an explicit refusal of retouching are all specified.