Image

FLUX.1: how to write prompts the model actually understands

Replicate · Updated:

FLUX.1 is Black Forest Labs' flagship image model (schnell, dev, pro, 1.1 pro Ultra). Its dual CLIP + T5-XXL encoder interprets long coherent descriptions significantly better than single-CLIP models, renders in-image text at top-tier level, and supports up to 2752×2752 in the Ultra variant.

What FLUX.1 does well

FLUX.1 is one of the strongest models for photorealism, portraits, and landscapes. The dual encoder lets the model parse coherent captions written like cinematographic briefs: long sentences with layered composition are handled much better than by single-CLIP models.

In-image text rendering is top in class, especially on [pro] and [max]. Full photo vocabulary is supported (35/50/85mm lenses, aperture, depth of field), a wide stylistic range from documentary to oil painting, and resolutions up to 2752×2752 in 1.1 pro Ultra with Raw mode.

  • Dual CLIP + T5-XXL encoder — best-in-class for long descriptions
  • Up to 2752×2752 in 1.1 pro Ultra, up to 1440×1440 in pro
  • Top-tier in-image text rendering
  • Variants: schnell, dev, pro, 1.1 pro Ultra
  • Prompt up to ~2000 tokens, optimal 50–200 words

Prompt structure

Optimal order: [Subject] + [Appearance details] + [Scene/Environment] + [Lighting] + [Style/Art direction] + [Camera/Technique] + [Mood/Color].

Example: «A wide-angle view of a snow-capped mountain range at dawn, mist swirling over the icy peaks, with a vibrant orange-pink sky in the background and a lone wolf in the foreground looking into the horizon, cinematic photography, shot on RED camera, dramatic warm light».

The core principle — coherent descriptive sentences, not tags. T5-XXL was trained on long captions and reads context better when given full prose constructions.

Layered description and camera

Describe the scene from foreground to background: «In the foreground, a large oak tree with golden autumn leaves. Behind it, a flowing river, and in the background, a mist-covered mountain range». This structure gives the model clear spatial depth.

For photorealism use photo vocabulary: «Shot with a 50mm lens at f/2.8, shallow depth of field, blurred background», «Wide-angle 24mm lens, deep focus, everything sharp», «Macro photography, extreme close-up, water droplets on a leaf». Camera parameters work far more reliably than generic quality phrases.

In-image text and art styles

FLUX.1 is one of the best models for text. Use quotes: «A neon sign reading "OPEN 24/7" on a dark brick wall», «A handwritten note saying "I love you" on vintage paper». Specify font, size, color, and placement for control.

Concrete art styles outperform abstract ones: «Impressionist painting with visible brushstrokes», «Art Nouveau poster style», «1980s cyberpunk neon aesthetic», «Bauhaus minimalism». Don't mix conflicting styles in one prompt — it produces unpredictable output.

Common mistakes

  1. 1. Keyword list instead of coherent description

    FLUX.1 is trained on long captions. A coherent sentence yields significantly better results than «mountain, snow, sky, blue, epic, detailed». T5-XXL reads context and word relationships — that's its main advantage.

  2. 2. Prompt weights and SD syntax

    FLUX.1 does NOT support `(word:1.5)`, `word++`, embeddings, or LoRA references — all of it lands in the prompt as literal noise. Use «with emphasis on» or «with a focus on» for emphasis, and regulate priority via word order.

  3. 3. Quality boosters «masterpiece, best quality, 8K»

    Unlike Stable Diffusion, these phrases barely affect FLUX.1 output. Concrete camera terms («85mm at f/1.8», «shallow DOF», «golden hour») and style references («editorial fashion photography», «Frank Frazetta concept art») work significantly better.

  4. 4. «White background» in FLUX.1 [dev]

    A known dev-variant issue: the phrase «white background» produces blurry, unclear images. Describe backgrounds concretely — «a soft grey studio backdrop», «seamless paper background with soft diffused light», «neutral cream-colored backdrop». The issue is not pronounced in pro and schnell.

  5. 5. Conflicting styles in one prompt

    «cyberpunk and medieval», «photorealistic watercolor painting», «minimalist detailed» — mutually exclusive directions confuse the model. If a stylistic blend is needed, describe it explicitly: «realistic photography with subtle painterly post-processing», not as two equal-weight styles.

Before / after examples

Example 1

Before

mountain landscape at dawn

After

A wide-angle view of jagged snow-capped peaks at dawn, mist swirling over the icy ridges. In the foreground, dark pine silhouettes; in the midground, a frozen lake reflecting the sky; in the background, vibrant orange-pink clouds illuminated by first light. Cinematic landscape photography, shot on 24mm lens, deep focus, dramatic warm rim light, restrained cool palette.

A layered composition (foreground/midground/background) gives the model full spatial structure. Concrete lens and lighting replace the vague «epic».

Example 2

Before

vintage bookstore sign with old typography

After

A weathered wooden shop sign hanging above a cobblestone street in 1920s Paris. The sign reads "LIBRAIRIE ANCIENNE" in elegant gold serif lettering with a curled border. Soft afternoon light catches the gilded letters. 35mm film photography, shallow depth of field, warm sepia palette, Atget-inspired documentary style.

Quotes lock the exact text. A concrete era, font, color, and photo style give the model full visual context instead of the generic «vintage».

Example 3

Before

photorealistic masterpiece, best quality, 8K, ultra HD, hyperrealistic portrait, beautiful woman

After

Editorial portrait of a woman in her thirties with freckles and short dark hair, wearing a cream-colored cashmere sweater. Soft natural light from a north-facing window, shallow depth of field, shot on 85mm lens at f/1.8. Calm contemplative expression, subtle film grain, muted warm palette, fashion editorial style.

Quality boosters barely affect FLUX. Concrete camera terms, a specific lighting type, and a style reference yield far better results than an adjective stack.

Frequently asked

What is the difference between schnell, dev, pro, and 1.1 pro Ultra?
[schnell] is a 4-step model under Apache 2.0 for prototyping and quick tests. [dev] is open source with 20+ steps for development and non-commercial work. [pro] delivers best quality via API with a commercial license. [1.1 pro] Ultra supports up to 2752×2752 and Raw mode — for large print and maximally detailed images.
What is the optimal prompt length?
50–200 English words is the sweet spot. Under 10 words the model invents too much; over 200 words priorities are lost and small details suffer. The T5-XXL technical limit is roughly 2000 tokens, but quality drops well before that. A dense 100-word prompt outperforms a sprawling 300-word one.
How does FLUX.1 render text?
FLUX.1 is one of the best models for in-frame text, especially [pro] and [max]. Use quotes for exact text: «A neon sign reading "OPEN 24/7"». Specify font («elegant serif», «bold sans-serif»), size, color, and placement. For brands and rare words the model renders letters cleaner than most competitors.
What language should prompts be written in?
English only for production tasks. The dual CLIP + T5-XXL encoder was trained predominantly on English data, and quality on other languages drops noticeably — style references, photo terminology, and compositional nuance all degrade. For experiments in other languages output works but is less precise.
When should Raw mode be used in 1.1 pro Ultra?
Raw mode delivers more «real» photorealism without the characteristic AI look — slightly desaturated palette, natural exposure, visible minor imperfections. Use it for documentary portraits, candid photography, journalistic style. For advertising and studio work the standard mode delivers a more «polished» result.
Does FLUX.1 support negative prompts?
Partially. Append «no text, no watermarks, no blur» or «without people, no human figures» at the end of the prompt. But negative formulations without a positive alternative are less effective — «clean composition without distractions» beats «no clutter, no extra objects». Main rule: describe what you want, not what you don't.
Does Opten support FLUX.1?
Yes, the Opten extension auto-detects FLUX.1 on supported platforms and scores prompts against the structure outlined above: it checks for description coherence, scene layering, camera terminology presence, and the absence of SD syntax and quality-booster stacks. One click delivers a rewrite in the correct structure.

Related models

Ready to write FLUX.1 Pro / FLUX.1.1 Pro Ultra prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672