Nano Banana: how to write prompts the model actually understands
Google · Updated:
Nano Banana is the umbrella name for Google's image model family in the Gemini API. The model understands natural language and full descriptive sentences, supports conversational editing, Google Search grounding, and reference images. English is the primary language; tag soup critically degrades quality. Write like a creative director, not a tag list.
What the Nano Banana family does
The headline difference from Midjourney and Stable Diffusion — Nano Banana thinks. The model doesn't just match keywords, it understands intent, physics, and composition. It supports text rendering, infographics, restoration, colorization, translation of 2D plans to 3D visuals, and structural control via sketches and wireframes.
Google Search grounding lets the model pull current data from search to visualize trends, real people, and events. The Google aesthetic is consistent: warm palette, saturated colors, clean composition. Base versions go up to 1K, Pro up to 4K.
- Natural language, full sentences, brief instead of tags
- Conversational editing — in-session tweaks
- Identity Locking via reference images
- Text rendering, infographics, restoration, colorization
- Google Search grounding for current data
Prompt structure
Optimal order: [Subject] + [Scene/Setting] + [Lighting/Mood] + [Style/Camera] + [Material/texture details] + [Purpose context].
Make the subject concrete: instead of «a woman» — «an elegant elderly woman in a vintage Chanel suit, silver hair, calm expression, upright posture.» State the purpose — what the image is for (magazine cover, cookbook, ad banner). This lets the model make style decisions automatically: for a cookbook it'll pick shallow depth of field and warm natural light on its own.
Edit, don't regenerate
Google's headline golden rule — conversational editing. If the image is 80% there, don't rewrite the prompt. Say: «great, but change the lighting to sunset and make the text neon blue.» The model remembers session context and applies the tweak to exactly one parameter.
This is fundamentally different from Midjourney or SDXL, where every seed is a new image. For NB models, regenerating for a small tweak is an anti-pattern. Especially for portraits with character consistency: ask «change the expression to surprised», not «draw the same person surprised.»
Identity Locking for series
When working with references, explicitly lock identity: «Keep the person's facial features exactly the same as Image 1. 100% identical facial features, bone structure, skin tone.» Without this command the model often «improves» the face — changes features, smooths skin, makes the look composite.
For group scenes specify the identity of EACH character separately and lock that their clothing/appearance stays stable across shots. For viral thumbnails the formula works in one pass: «character from Image 1 + yellow arrow + outlined text + bold graphics.»
Common mistakes
1. Tag soup instead of sentences
«dog, park, 4k, realistic» is legacy diffusion syntax. Nano Banana was trained on descriptive text and produces generic, unfocused results from tag soup. Use full sentences with proper grammar and descriptive adjectives — this alone doubles quality on the same set of words.
2. Regenerating instead of editing
If the image is 80% there and you need a small fix, don't rewrite the prompt. Say in the dialog «great, now change the lighting to sunset.» Regenerating to flip one parameter is an anti-pattern for NB. Especially critical for character consistency: a new seed = a new face.
3. No Identity Locking with references
Without an explicit «keep facial features exactly the same as Image 1» the model often «improves» the face: changes features, smooths skin, makes the look composite. For series and repeated use of one character, Identity Locking is a required block, not an option.
4. Ignoring textures and materials
Without texture description the image comes out «smooth» and AI-recognizable. Name specific surfaces: «visible pores», «matte ceramic», «brushed steel», «rough concrete.» For portraits texture matters more than the lens; for products — more than the lighting. The Google family is specifically tuned for detailed surfaces.
5. Copying Midjourney or DALL-E syntax
Parameters like `::weight`, `--ar 16:9`, `(keyword:1.2)`, BREAK don't work and end up as literal noise in the prompt. Set format with words («16:9», «portrait», «square»), weight ideas by order (important first), styles via normal adjectives. The Nano Banana family is its own syntax universe.
Before / after examples
Example 1
Before
cool car in the city
After
Cinematic wide shot of a futuristic sports car speeding down a rain-soaked Tokyo street at night. Neon signs reflect on the wet asphalt and the car's polished black metallic body. Long-exposure motion streaks suggest speed. Shot on 35mm with anamorphic lens, shallow depth of field, moody cyberpunk color grade — deep blues and magentas. Style: editorial automotive photography for a luxury brand campaign.
Tag soup «cool car, neon, city, night, 8k» replaced by a photographer brief: composition, lighting, lens, color grade, and purpose (luxury brand campaign).
Example 2
Before
remove tourists from the photo
After
In this photo, remove all background tourists. Fill the empty space with logical environmental textures: matching cobblestone pavement, the same shopfronts continuing seamlessly, consistent shadow direction from the sun. Preserve: the main subject (the woman in red coat in the foreground), the building architecture, the camera angle, the time-of-day lighting. No watermark, no extra figures.
Mask-free semantic editing is a Nano Banana feature. The preserve block plus an explicit «fill with logical textures» gives a clean result with no inpainting artifacts.
Example 3
Before
cookbook cover with a sandwich
After
Premium cover image for a Brazilian gourmet cookbook. Hero shot of a freshly grilled chicken sandwich with melted cheese, crisp lettuce, and a brioche bun, sliced in half and slightly tilted to show the layers. Soft natural window light from the left, shallow depth of field, warm rustic wooden surface, faint herb garnish in the background blur. Editorial food photography style, professional plating, appetizing color grade.
Purpose («premium cookbook cover») activates the right mode — the model picks depth of field, plating, and warmth of light on its own. This is a Google-family trait.