Image

Imagen 4 Ultra: how to write prompts the model actually understands

Google · Updated:

Imagen 4 Ultra is Google's premium Imagen 4 with maximum detail and prompt fidelity. It rewards long, detailed descriptions (100–400 words), renders micro-textures (pores, threads, stitches), and handles complex multi-figure scenes. It has the best text rendering in the family and is the most obedient Google model for production work.

What Ultra adds over standard Imagen 4

Imagen 4 Ultra is the most prompt-faithful model in the family. Describe 15 details and it will try to render all 15; standard Imagen 4 may simplify or skip some. Ultra renders at a micro level: individual hair strands with highlights, skin pores and freckles, fabric threads and stitching, water droplets with believable refraction.

Best typography in the family — long captions with correct kerning, multiple lines at once, several fonts in one frame. Complex multi-figure scenes retain detail: battle scenes with dozens of figures, urban landscapes with architecture, crowds with individual features. Ultra is slower than the standard version — the priority is quality.

  • Maximum prompt fidelity — renders ALL described details
  • Micro-textures: pores, freckles, threads, stitches, water droplets
  • Best text rendering in the Imagen family
  • Complex multi-figure scenes without simplification
  • Optimal prompt length — 100–400 words

Prompt structure and extended SCULPT

Optimal order for Ultra: [Image type/style/camera] + [Subject with maximum detail] + [Action/pose] + [Setting with layered description] + [Lighting with specifics] + [Angle/composition/depth of field] + [Materials/textures] + [Color palette] + [Mood/atmosphere] + [Post-processing].

SCULPT for Ultra runs in extended form: Subject — maximally detailed description («towering 3-meter alien robot with elongated head composed of hundreds of intricate components»), Context — layered environment, Unique details — micro-details, Lighting — specific sources and interactions, Perspective — concrete lens and parameters (f/2.0, low-angle wide), Tone/Theme — style + post-processing + references.

Layered composition and the cinematic stack

For complex scenes, Ultra works best with layered descriptions. Structure: «Foreground: [detailed description]. Middle ground: [subject and main scene]. Background: [distant elements, sky, horizon]». This gives the model a clear compositional hierarchy.

Maximum cinematic stack for Ultra: [Camera] + [Lens] + [Film/ISO] + [Aperture] + [Angle] + [Depth of field] + [Color grading] + [Post-processing] + [Atmospheric effects]. Example: «Leica M10, 50mm Summilux lens, shot at f/2.0, Cinestill 50D tones, dramatic chiaroscuro lighting, rich textures, 8K resolution, subtle film grain, chromatic aberration». Ultra realises the full stack more accurately than the standard version.

Typography and atmospheric effects

Best typography among all Imagen models. Ultra handles long captions with correct kerning, multiple words and lines at once, different fonts in one frame. The formula: exact text in quotes + font style + placement + context. Example: «A vintage neon sign above the entrance reading "HOTEL CALIFORNIA" in warm amber cursive lettering, with a subtle flicker effect».

Atmospheric effects render cleanly in Ultra: particles («embers, ash, dust swirling in the air»), god rays («volumetric god rays filtering through the canopy»), smoke and mist («mist swirls gently, creating depth»), sparks («sparks flying from clashing weapons»), glow («neon tubes with a pinkish glow, amber-lit eyes»).

Common mistakes

  1. 1. A prompt that's too short for Ultra

    Ultra is built for detailed descriptions — a prompt under 15 words almost always means the model is underused. If 10 words is enough, use standard Imagen 4 or Fast. Ultra pays off on long, well-structured prompts of 100–400 words with layered composition and the full cinematic stack.

  2. 2. Prompt without a clear hierarchy of importance

    Even in a long prompt, the model prioritizes the opening. If the main subject is buried in the third paragraph among background descriptions, the model anchors on the first thing it reads — which might be a setting detail. Put the main subject and style in the first sentence, then expand into detail.

  3. 3. Fictional proper names for photorealism

    Ultra is prompt-faithful but bound by training data. «Photorealistic Valyrian lord» renders as a book illustration. For a photorealistic style, describe characteristics: «a lord from a glorious titanic city with Greco-Roman architecture, wearing ornate ceremonial robes embroidered with golden thread».

  4. 4. Conflicting styles in a single prompt

    «Photorealistic anime surreal pencil sketch» creates an uncontrolled mix. Ultra renders everything that's described, so conflicting instructions yield worse results than in the standard version. Commit to one primary style and use supporting stylistic markers within it.

  5. 5. Missing lighting in a photorealistic prompt

    For Ultra, lighting is critical. Without an explicit light type the model defaults to a «neutral» option and the frame looks flat. Specify a concrete source and quality: «chiaroscuro with god rays», «golden hour rim light», «soft window light with bounced fill», «volumetric tungsten light from overhead fixture».

Before / after examples

Example 1

Before

shogun in golden armor in front of a temple

After

Immersive, hyperrealistic cinematic scene depicting a powerful shogun seated regally before an ancient, weathered stone Japanese temple. Foreground: the shogun is clad in incredibly complex and detailed golden armor featuring intricate ornamentation, including sculpted golden dragons intertwined with detailed engravings, the armor gleams with a realistic, warm metallic luster. Middle ground: two stoic samurai guards stand on either side, holding ornamental spears. Background: ancient temple with weathered stone walls, traditional curved roof, distant misty mountains. Soft golden morning sunlight bathes the scene with god rays filtering through subtle mist, creating long shadows. Shot on ARRI Alexa with 85mm lens at f/2.0, telephoto compression, shallow depth of field, Cinestill 50D tones, rich warm color grading, Peter Jackson epic style, subtle film grain.

Layered composition (foreground / middle ground / background), full cinematic stack, atmospheric effects, style reference — Ultra realises all of it.

Example 2

Before

neon hotel sign

After

A vintage neon sign above the entrance of a 1970s motel reading "HOTEL CALIFORNIA" in warm amber cursive lettering, with a subtle flicker effect on the letter 'C'. Below the main sign, a smaller pink neon strip reads "VACANCY" in bold sans-serif. The sign is mounted on a weathered brick wall with visible mortar lines, paint chips, and traces of rust. Foreground: empty cracked asphalt parking lot with faint puddle reflections of the neon glow. Background: dark desert night sky with stars and a faint silhouette of distant mountains. Shot on 35mm film with anamorphic lens at f/1.4, shallow depth of field, neon glow casts pink and amber color cast on surrounding surfaces, chromatic aberration on bright lights, cinematic Americana mood.

Two text blocks with different fonts and placements, physical material details (brick, mortar, rust), layered composition, and atmospheric effects (neon glow, chromatic aberration).

Example 3

Before

beautiful portrait of a red-haired girl

After

Editorial close-up portrait of a young woman in her late twenties with vibrant copper-red hair styled in soft natural waves, individual strands catching golden hour light with subtle highlights. Visible freckles across her nose and cheekbones, light dusting of subtle makeup, soft natural eyebrows, hazel-green eyes with depth and emotion. She wears a cream cashmere turtleneck with visible knit texture and individual fibers, a delicate gold necklace with iridescent pearl pendant catching the light. Foreground: subtle bokeh of yellow autumn leaves. Background: out-of-focus park with warm late-afternoon golden light, soft layered bokeh. Shot on Leica M10 with 85mm Summilux lens at f/1.4, razor-sharp focus on her eyes, Kodak Portra 400 film tones, rich warm color grading with natural skin tones, subtle film grain, fashion editorial style.

Micro-detailing (individual strands, freckles, cashmere fibers), full cinematic stack, layered background, portrait lens at low aperture — a textbook Ultra prompt.

Frequently asked

When should I use Ultra instead of standard Imagen 4?
Pick Ultra for final production imagery: print, billboards, premium marketing, covers, epic scenes, fashion editorial with micro-detail. For drafts, A/B tests, mockups, and iterations, use standard Imagen 4 or Fast — they're faster and cheaper. Ultra justifies itself on long detailed prompts with layered composition.
What's the optimal prompt length for Ultra?
100–400 words in natural English. Under 50 words the model is underused and the standard version is more cost-effective. Over 500 words without a clear hierarchy causes conflicting instructions and loss of focus. The sweet spot is 200–300 words with layered composition (foreground / middle ground / background) and the full cinematic stack.
How do I use layered composition?
Describe the scene in three layers: Foreground (detailed elements up close, bokeh), Middle ground (main subject and primary scene), Background (distant elements, sky, horizon). This gives Ultra a clear compositional hierarchy, and the model distributes depth of field, lighting, and detail across layers more correctly than with a «flat» description.
Does Ultra support negative prompts?
No. Like the rest of the Imagen family, Ultra doesn't support a negative prompt. Phrasings with «no», «without», «not» are either ignored or trigger the opposite effect. Phrase positively: instead of «no people» use «empty street»; instead of «not blurry» use «razor-sharp focus»; instead of «no clouds» use «clear blue sky with subtle gradient».
How does Ultra render long text?
Ultra leads the Imagen family in typography. It handles long captions with correct kerning, multiple lines, and different fonts in a single frame. The formula: exact text in quotes + font style + placement + context. For best results, split long text into 2–3 short blocks with explicit placement for each — that's more reliable than one long block.
Can I write prompts in languages other than English?
You can, but quality drops significantly. Imagen 4 Ultra is optimized for English, especially for technical cinematic vocabulary («ARRI Alexa», «Cinestill 50D», «chromatic aberration»). In other languages the model loses stylistic nuance and struggles with photo/film terms. For the premium tasks Ultra is meant for, translate the prompt to English.
Does Opten support Imagen 4 Ultra?
Yes, the Opten extension auto-detects Imagen 4 Ultra inside ImageFX, Vertex AI, Google AI Studio, and Freepik. Scoring accounts for Ultra's specifics: it checks prompt length (recommending an expansion to 100+ words), the presence of layered composition, the full cinematic stack, and typography. One click delivers a rewrite with the extended SCULPT structure.

Related models

Ready to write Imagen 4 Ultra prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672