Imagen 4 Ultra: how to write prompts the model actually understands
Google · Updated:
Imagen 4 Ultra is Google's premium Imagen 4 with maximum detail and prompt fidelity. It rewards long, detailed descriptions (100–400 words), renders micro-textures (pores, threads, stitches), and handles complex multi-figure scenes. It has the best text rendering in the family and is the most obedient Google model for production work.
What Ultra adds over standard Imagen 4
Imagen 4 Ultra is the most prompt-faithful model in the family. Describe 15 details and it will try to render all 15; standard Imagen 4 may simplify or skip some. Ultra renders at a micro level: individual hair strands with highlights, skin pores and freckles, fabric threads and stitching, water droplets with believable refraction.
Best typography in the family — long captions with correct kerning, multiple lines at once, several fonts in one frame. Complex multi-figure scenes retain detail: battle scenes with dozens of figures, urban landscapes with architecture, crowds with individual features. Ultra is slower than the standard version — the priority is quality.
- Maximum prompt fidelity — renders ALL described details
- Micro-textures: pores, freckles, threads, stitches, water droplets
- Best text rendering in the Imagen family
- Complex multi-figure scenes without simplification
- Optimal prompt length — 100–400 words
Prompt structure and extended SCULPT
Optimal order for Ultra: [Image type/style/camera] + [Subject with maximum detail] + [Action/pose] + [Setting with layered description] + [Lighting with specifics] + [Angle/composition/depth of field] + [Materials/textures] + [Color palette] + [Mood/atmosphere] + [Post-processing].
SCULPT for Ultra runs in extended form: Subject — maximally detailed description («towering 3-meter alien robot with elongated head composed of hundreds of intricate components»), Context — layered environment, Unique details — micro-details, Lighting — specific sources and interactions, Perspective — concrete lens and parameters (f/2.0, low-angle wide), Tone/Theme — style + post-processing + references.
Layered composition and the cinematic stack
For complex scenes, Ultra works best with layered descriptions. Structure: «Foreground: [detailed description]. Middle ground: [subject and main scene]. Background: [distant elements, sky, horizon]». This gives the model a clear compositional hierarchy.
Maximum cinematic stack for Ultra: [Camera] + [Lens] + [Film/ISO] + [Aperture] + [Angle] + [Depth of field] + [Color grading] + [Post-processing] + [Atmospheric effects]. Example: «Leica M10, 50mm Summilux lens, shot at f/2.0, Cinestill 50D tones, dramatic chiaroscuro lighting, rich textures, 8K resolution, subtle film grain, chromatic aberration». Ultra realises the full stack more accurately than the standard version.
Typography and atmospheric effects
Best typography among all Imagen models. Ultra handles long captions with correct kerning, multiple words and lines at once, different fonts in one frame. The formula: exact text in quotes + font style + placement + context. Example: «A vintage neon sign above the entrance reading "HOTEL CALIFORNIA" in warm amber cursive lettering, with a subtle flicker effect».
Atmospheric effects render cleanly in Ultra: particles («embers, ash, dust swirling in the air»), god rays («volumetric god rays filtering through the canopy»), smoke and mist («mist swirls gently, creating depth»), sparks («sparks flying from clashing weapons»), glow («neon tubes with a pinkish glow, amber-lit eyes»).
Common mistakes
1. A prompt that's too short for Ultra
Ultra is built for detailed descriptions — a prompt under 15 words almost always means the model is underused. If 10 words is enough, use standard Imagen 4 or Fast. Ultra pays off on long, well-structured prompts of 100–400 words with layered composition and the full cinematic stack.
2. Prompt without a clear hierarchy of importance
Even in a long prompt, the model prioritizes the opening. If the main subject is buried in the third paragraph among background descriptions, the model anchors on the first thing it reads — which might be a setting detail. Put the main subject and style in the first sentence, then expand into detail.
3. Fictional proper names for photorealism
Ultra is prompt-faithful but bound by training data. «Photorealistic Valyrian lord» renders as a book illustration. For a photorealistic style, describe characteristics: «a lord from a glorious titanic city with Greco-Roman architecture, wearing ornate ceremonial robes embroidered with golden thread».
4. Conflicting styles in a single prompt
«Photorealistic anime surreal pencil sketch» creates an uncontrolled mix. Ultra renders everything that's described, so conflicting instructions yield worse results than in the standard version. Commit to one primary style and use supporting stylistic markers within it.
5. Missing lighting in a photorealistic prompt
For Ultra, lighting is critical. Without an explicit light type the model defaults to a «neutral» option and the frame looks flat. Specify a concrete source and quality: «chiaroscuro with god rays», «golden hour rim light», «soft window light with bounced fill», «volumetric tungsten light from overhead fixture».
Before / after examples
Example 1
Before
shogun in golden armor in front of a temple
After
Immersive, hyperrealistic cinematic scene depicting a powerful shogun seated regally before an ancient, weathered stone Japanese temple. Foreground: the shogun is clad in incredibly complex and detailed golden armor featuring intricate ornamentation, including sculpted golden dragons intertwined with detailed engravings, the armor gleams with a realistic, warm metallic luster. Middle ground: two stoic samurai guards stand on either side, holding ornamental spears. Background: ancient temple with weathered stone walls, traditional curved roof, distant misty mountains. Soft golden morning sunlight bathes the scene with god rays filtering through subtle mist, creating long shadows. Shot on ARRI Alexa with 85mm lens at f/2.0, telephoto compression, shallow depth of field, Cinestill 50D tones, rich warm color grading, Peter Jackson epic style, subtle film grain.
Layered composition (foreground / middle ground / background), full cinematic stack, atmospheric effects, style reference — Ultra realises all of it.
Example 2
Before
neon hotel sign
After
A vintage neon sign above the entrance of a 1970s motel reading "HOTEL CALIFORNIA" in warm amber cursive lettering, with a subtle flicker effect on the letter 'C'. Below the main sign, a smaller pink neon strip reads "VACANCY" in bold sans-serif. The sign is mounted on a weathered brick wall with visible mortar lines, paint chips, and traces of rust. Foreground: empty cracked asphalt parking lot with faint puddle reflections of the neon glow. Background: dark desert night sky with stars and a faint silhouette of distant mountains. Shot on 35mm film with anamorphic lens at f/1.4, shallow depth of field, neon glow casts pink and amber color cast on surrounding surfaces, chromatic aberration on bright lights, cinematic Americana mood.
Two text blocks with different fonts and placements, physical material details (brick, mortar, rust), layered composition, and atmospheric effects (neon glow, chromatic aberration).
Example 3
Before
beautiful portrait of a red-haired girl
After
Editorial close-up portrait of a young woman in her late twenties with vibrant copper-red hair styled in soft natural waves, individual strands catching golden hour light with subtle highlights. Visible freckles across her nose and cheekbones, light dusting of subtle makeup, soft natural eyebrows, hazel-green eyes with depth and emotion. She wears a cream cashmere turtleneck with visible knit texture and individual fibers, a delicate gold necklace with iridescent pearl pendant catching the light. Foreground: subtle bokeh of yellow autumn leaves. Background: out-of-focus park with warm late-afternoon golden light, soft layered bokeh. Shot on Leica M10 with 85mm Summilux lens at f/1.4, razor-sharp focus on her eyes, Kodak Portra 400 film tones, rich warm color grading with natural skin tones, subtle film grain, fashion editorial style.
Micro-detailing (individual strands, freckles, cashmere fibers), full cinematic stack, layered background, portrait lens at low aperture — a textbook Ultra prompt.