Imagen 4: how to write prompts the model actually understands
Google · Updated:
Imagen 4 is Google's next-generation image model with upgraded typography and ultra-photorealism. It works on natural language, is optimized for English, and supports crisp in-image text with proper kerning. It ships in three versions — Fast, standard, and Ultra; all share the same prompt structure and Fast is up to 10x quicker than Imagen 3.
What Imagen 4 does well
Imagen 4 is among the leaders in photorealistic generation: hair, skin, and fabric textures render at studio-photography quality; water droplets, reflections, and light refraction are physically plausible. It supports 1:1, 4:3, 3:4, 9:16, 16:9 aspect ratios up to 1024×1024 (platform-dependent).
The headline advantage over competitors is advanced text rendering: clear, legible type with correct kerning. It works for posters, packaging, signage, and branded layouts. Google's content filters block realistic faces of public figures, NSFW content, violence, and copyrighted material. Negative prompts are not supported.
- Three versions: Fast (drafts), standard (balanced), Ultra (premium)
- Advanced typography with proper kerning
- Ultra-photorealism: skin, fabrics, reflections
- Up to 10x faster than Imagen 3 (Fast)
- Aspect ratios 1:1, 4:3, 3:4, 9:16, 16:9
Prompt structure and the SCULPT framework
Optimal order: [Image type/style] + [Subject with details] + [Action/pose] + [Setting/scene] + [Lighting] + [Angle/composition] + [Materials/textures] + [Mood/atmosphere].
The SCULPT framework gives a convenient checklist: Subject («battle-hardened samurai in white porcelain armor»), Context («misty bamboo grove at dawn»), Unique details («armor adorned with intricate blue paintings»), Lighting («soft dappled light filtering through the canopy»), Perspective («dramatic close-up, low angle, shallow depth of field»), Tone/Theme («Akira Kurosawa style, high-contrast black and white»). Recommended length: 50–300 words in natural English.
Text rendering: the Imagen 4 superpower
Imagen 4 is a state-of-the-art typography model. Crisp text on signs, posters, and packaging, with correct kerning and letter spacing. To land it cleanly in frame, three things are required:
Exact text in quotes («reads "Tasty Burger"»). Font style described — «large, bold, groovy white bubble typography», «handwritten script», «vintage serif». Explicit placement — «at the top», «on the banner», «above the entrance». The more concrete the font and its placement, the more accurate the result — especially for branding and marketing layouts.
Cinematic stack and textures
Imagen 4 responds beautifully to professional photo/film vocabulary. Camera and lens: «Leica M10», «50mm Summilux», «ARRI Alexa», «anamorphic lens». Film: «Cinestill 50D», «Kodak Vision3 500T», «Kodak Portra 400», «35mm film grain». Aperture «shot at f/2.0» controls depth of field. Post-processing: «color grading», «LUT», «digital intermediate», «film emulation».
For materials, use physical descriptions: «porcelain carapace with intricate blue paintings», «worn leather with visible stitching and patina», «iridescent feathers with subtle hues of lavender and rose gold». For complex scenes describe layers — «In the foreground… In the middle ground… The background shows…» — giving the model a clear compositional hierarchy.
Common mistakes
1. Comma-separated tags instead of coherent sentences
Imagen 4 is optimized for natural language. «Girl, red coat, Tokyo, neon, bokeh, cinematic» performs worse than a connected description. Write the prompt as a brief for a photographer: complete sentences with meaningful order, concrete details, and logical links between elements.
2. Fictional proper names for photorealism
A prompt like «photorealistic image of Valyria» triggers the model to associate it with fantasy illustrations and concept art. For a photorealistic style, describe characteristics: «glorious titanic city with Greco-Roman architecture» instead of «Valyria»; «epic warrior in golden plate armor» instead of «Achilles».
3. Negative phrasing
Imagen doesn't support a negative prompt. «No trees, no clouds, without shadows» is either ignored or, paradoxically, adds the mentioned objects. Phrase positively: «clear blue sky», «empty street», «bright noon lighting» instead of «no clouds», «no people», «no shadows».
4. Requests for public figures' faces
Google's content filter blocks realistic images of famous people — politicians, actors, musicians. Swap the specific name for a description of characteristics («a man in his 50s with grey hair and a sharp suit») or switch the style to editorial/concept art where naming isn't required.
5. Conflicting styles or an overloaded prompt
«Photorealistic anime watercolor oil painting» creates an uncontrolled mix. A prompt over 500 words without a clear hierarchy of importance leads to conflicting instructions. Pick one primary style and keep the length between 50–300 words with the main subject in the opening sentence.
Before / after examples
Example 1
Before
samurai in beautiful armor
After
A battle-hardened samurai in white porcelain armor adorned with intricate blue paintings, standing in a misty bamboo grove at dawn, soft dappled light filtering through the canopy, dramatic close-up at low angle, shallow depth of field, cinematic tension, Akira Kurosawa style, high-contrast color palette with earthy neutrals and splashes of deep crimson, shot on 35mm film with subtle grain.
Full SCULPT in one prompt: subject with unique details (porcelain armor, blue paintings), context (misty bamboo grove), lighting (dappled light), perspective (close-up, low angle, f/2.0), tone (Kurosawa style).
Example 2
Before
burger poster with text
After
Vintage burger restaurant poster, large bold groovy white bubble typography at the top reads "Tasty Burger", subtitle in handwritten red script below reads "since 1972", warm orange background with subtle paper texture, hand-painted lettering style with playful tilt, centered composition, editorial layout, muted earth tones, photorealistic print quality.
Brand name and caption in quotes with distinct fonts, explicit placement, color palette, background material — Imagen 4 assembles a near production-ready layout.
Example 3
Before
red-haired girl in a city
After
Editorial fashion photograph of a young woman with vibrant copper-red hair styled in loose waves, wearing a tailored cream wool coat over a black turtleneck, walking through a rain-slicked Tokyo street at blue hour, neon reflections in puddles, shot on Leica M10 with 50mm Summilux lens at f/2.0, shallow depth of field with creamy bokeh, Cinestill 50D film stock, cinematic color grading with cool blue and amber highlights, layered composition with soft background blur.
The full cinematic stack: camera, lens, aperture, film, color grading. Detailed clothing and hair description leverages Imagen 4's photorealism strengths.