How is Nano Banana 2 different from Nano Banana Pro?

NB2 is 2K, up to 6 references, basic thinking mode, and basic text rendering. Pro is 4K, up to 14 references (6 high fidelity), full thinking with intermediate images, and SOTA text rendering. But NB2 isn't a «stripped-down Pro»: on portrait close-ups, selfies, macro skin, and candid photos, NB2 objectively delivers a more natural result — less uncanny valley and less AI gloss.

Why does NB2 smooth the skin even when I don't ask?

That's its core weakness — over-edit on loaded references. The model «helps» beyond the request: smooths pores, removes imperfections, adds gloss. The cure is an explicit preserve block: «keep identity 100% — do not stylize, no enhancement, do not airbrush skin, preserve natural pores and texture.» It's a stack to repeat on every editing turn.

Can I write prompts in languages other than English?

Technically yes, NB2 is multilingual, but Google trained the model with English as the primary language. Complex prompts in Russian or other languages get less predictable results. Recommendation: keep the bulk of the prompt in English; in-image text can be in any language (Cyrillic, CJK are supported, though Pro handles them better).

When to pick NB2 vs Pro?

NB2 — single portraits, selfies, macro skin, candid documentary, extreme aspect ratios (4:1, 8:1), mass iteration (10 variants before finalizing). Pro — 4+ characters in frame, hero campaigns, dense text on posters/packaging, infographics, ray-traced lighting, 9-10 frame storyboards, structural sketch control. It's not «higher = better» — they're different tools.

Is conversational editing supported?

Yes, and it's the recommended way to make small tweaks. Don't rewrite the prompt to change lighting — just say: «great, now switch to sunset light and make the jacket dark blue, keep everything else as-is.» The model remembers session context. The key rule — always add a preserve block on edits, or NB2 will «improve» the face.

How do I get film realism without the AI gloss?

Stack: «35mm film, Kodak Portra 400, natural grain, warm muted tones, shallow depth of field, golden hour» + anti-glamour vocabulary: «no airbrush look, visible pores, weathered skin texture, no glamour retouch.» Describe real imperfections — that's the counterweight to the commercial gloss NB2 drifts into by default. Stating a camera era (1990s, 2000s) also helps.

Does Opten support Nano Banana 2?

Yes, the Opten extension auto-detects Nano Banana 2 inside Google AI Studio and Gemini and scores prompts against the structure above: it checks for subject specificity, natural language instead of tag soup, a preserve block on edits, and anti-glamour vocabulary on portraits. One click gives you a rewrite that won't drift into commercial gloss and will keep real skin texture.

Image

Nano Banana 2: how to write prompts the model actually understands

Name: Nano Banana 2
Brand: Google

Google · Updated: May 19, 2026

Nano Banana 2 is Google's second-generation image model in the Gemini API, with up to 2K resolution, basic thinking mode, and support for up to 6 reference images. It reads prompts as a designer brief in natural language — not tag soup. English is the primary language; conversational editing is supported.

What Nano Banana 2 does well

The model is tuned toward Google's commercially bright aesthetic: warm palette, saturated colors, clean composition. It excels at portrait close-ups — noticeably less uncanny valley than Pro and more natural skin texture with pores and micro-imperfections. Film stocks (Kodak Portra 400, 35mm), retro eras (1990s, 2000s), selfies, and social-media content are all sweet spots.

For light editing there's a conversational mode — in-session tweaks are normal. Basic text rendering works for short labels (1-3 words); dense text and infographics are better handed off to Pro or GPT Image 2.

Up to 2K, up to 6 reference images per request
Natural language, full sentences, brief instead of tags
Basic thinking mode + basic Google Search grounding
Strengths: portraits, selfies, film realism, candid documentary
3-5x faster and ~25% the cost of Pro — the iteration model

Prompt structure

Optimal order: [Subject with details] + [Scene/Setting] + [Lighting/Mood] + [Style] + [Camera (optional)] + [Format].

The main rule is subject specificity. Instead of «a girl on the street» — «a young woman with short red hair in a denim jacket stands at a Tokyo intersection in the evening, street lamps casting warm reflections on wet asphalt.» A prompt under 10 words gets filled in by the model's statistical mean, and the result is generic.

Purpose context (what this image is for — album cover, avatar, ad banner) helps the model make style decisions automatically. This is a Google-family trait — it thinks about the goal, not just the visuals.

Natural language, not tag soup

Tag soup in the Midjourney style («woman, paris, cafe, golden, 4k, realistic») critically degrades output. Nano Banana 2 was trained on descriptive text and expects connected sentences. Same logic as gpt-image-2 and the Pro version — write like a creative director briefing a photographer.

Parameters like `--ar 16:9`, `::weight`, `(keyword:1.2)`, BREAK don't work and end up as literal noise in the prompt. Set format with words («16:9», «portrait», «square»), weight ideas by order (important first), styles via normal adjectives.

Editing and over-edit

Conversational editing is the primary mode for tweaks. If an image is 80% there, don't regenerate: «great, now switch the background to sunset and make the jacket dark blue.» The model remembers session context.

Known weakness — over-edit with loaded references: NB2 tends to «improve» the reference even when asked to keep it as-is. The fix is an explicit preserve block: «keep identity 100% — do not stylize, no enhancement, do not airbrush skin.» For portraits with real skin this is critical — without the block the model smooths pores into a glossy face.

Common mistakes

1. Tag soup instead of sentences
«woman, paris, cafe, golden, 4k, realistic» is legacy Midjourney/SD syntax. Nano Banana 2 was trained on descriptive text and produces generic, unfocused results from tag soup. Write connected sentences as a brief for a photographer — that alone doubles quality on the same set of words.
2. Prompt that's too short (<10 words)
«Girl in a café» — the model fills in everything else by the statistical average of training data. You'll get a generic blonde in a generic Starbucks with a generic latte. A minimum working prompt is subject with details + scene + style. That's ~25-40 words as a starting point.
3. Describing the image during I2I editing
In conversational edits, don't restate what's in the picture. The model already sees it. A prompt like «in the photo there's a girl in a café, change the background» wastes tokens — just say «change only the background to...» Restating content sometimes conflicts with what the model has already parsed from the reference.
4. No preserve block on edits
NB2 over-edits references — it «improves» beyond the request. «Change the background» without «preserve: face, identity, skin texture» usually changes the skin too, smoothing pores into gloss. Every edit should end with an explicit preserve list — this is a known model weakness, not a prompt failure.
5. Using NB2 for dense text and complex scenes
Posters with long copy, infographics, packaging, 4+ characters in frame — that's Pro or GPT Image 2 territory. NB2 handles short labels (1-3 words) and single portraits. On heavy scenes it confuses identities and breaks dense text — that's an architectural ceiling of the version, not a prompt fix.

Before / after examples

Example 1

Before

pretty girl in a café

After

A young woman in her late twenties with short auburn hair and freckles, wearing a worn olive-green linen shirt, sitting at a small marble café table in Lisbon, sipping espresso. Soft afternoon light spills through the window, warm muted tones, shallow depth of field. Shot on 35mm film, Kodak Portra 400, natural grain, visible pores, no airbrush look. Editorial documentary style.

Concrete subject, concrete location, concrete lighting and a film stack. The «no airbrush» line is an anti-glamour stop signal — without it NB2 will smooth the skin.

Example 2

Before

make the background darker

After

Keep the subject and pose exactly as is. Change only the background: from the bright café window to a moody, dim interior with warm tungsten lamp light in the far corner. Preserve: face, identity, skin texture, hair, clothing, camera angle, framing. No re-styling of the person, no airbrushing, no over-saturation.

A conversational edit with an explicit preserve block. Without it NB2 «improves» the face too, smoothing skin texture — that's its textbook over-edit.

Example 3

Before

stylish social media avatar

After

Square 1:1 social media avatar for an indie illustrator. Mid-shot of a young man with curly black hair and round tortoise-shell glasses, wearing a mustard knit sweater, faint smile, natural relaxed posture. Soft north-window light, neutral grey background, warm color grade. Style: editorial portrait with subtle 2000s digital camera feel, natural skin texture with visible pores, no glamour retouch.

Purpose («social media avatar for an indie illustrator») activates the right mode in NB2. Format, camera era, and an explicit refusal of retouching are all specified.

Nano Banana 2: how to write prompts the model actually understands

What Nano Banana 2 does well

Prompt structure

Natural language, not tag soup

Editing and over-edit

Common mistakes

1. Tag soup instead of sentences

2. Prompt that's too short (<10 words)

3. Describing the image during I2I editing

4. No preserve block on edits

5. Using NB2 for dense text and complex scenes

Before / after examples

Frequently asked

Related models

Z-Image (Base / Turbo)

Wan (General — 2.5 / 2.6)

Seedream 5 Lite

Ready to write Nano Banana 2 prompts in one click?