FLUX.1: how to write prompts the model actually understands
Replicate · Updated:
FLUX.1 is Black Forest Labs' flagship image model (schnell, dev, pro, 1.1 pro Ultra). Its dual CLIP + T5-XXL encoder interprets long coherent descriptions significantly better than single-CLIP models, renders in-image text at top-tier level, and supports up to 2752×2752 in the Ultra variant.
What FLUX.1 does well
FLUX.1 is one of the strongest models for photorealism, portraits, and landscapes. The dual encoder lets the model parse coherent captions written like cinematographic briefs: long sentences with layered composition are handled much better than by single-CLIP models.
In-image text rendering is top in class, especially on [pro] and [max]. Full photo vocabulary is supported (35/50/85mm lenses, aperture, depth of field), a wide stylistic range from documentary to oil painting, and resolutions up to 2752×2752 in 1.1 pro Ultra with Raw mode.
- Dual CLIP + T5-XXL encoder — best-in-class for long descriptions
- Up to 2752×2752 in 1.1 pro Ultra, up to 1440×1440 in pro
- Top-tier in-image text rendering
- Variants: schnell, dev, pro, 1.1 pro Ultra
- Prompt up to ~2000 tokens, optimal 50–200 words
Prompt structure
Optimal order: [Subject] + [Appearance details] + [Scene/Environment] + [Lighting] + [Style/Art direction] + [Camera/Technique] + [Mood/Color].
Example: «A wide-angle view of a snow-capped mountain range at dawn, mist swirling over the icy peaks, with a vibrant orange-pink sky in the background and a lone wolf in the foreground looking into the horizon, cinematic photography, shot on RED camera, dramatic warm light».
The core principle — coherent descriptive sentences, not tags. T5-XXL was trained on long captions and reads context better when given full prose constructions.
Layered description and camera
Describe the scene from foreground to background: «In the foreground, a large oak tree with golden autumn leaves. Behind it, a flowing river, and in the background, a mist-covered mountain range». This structure gives the model clear spatial depth.
For photorealism use photo vocabulary: «Shot with a 50mm lens at f/2.8, shallow depth of field, blurred background», «Wide-angle 24mm lens, deep focus, everything sharp», «Macro photography, extreme close-up, water droplets on a leaf». Camera parameters work far more reliably than generic quality phrases.
In-image text and art styles
FLUX.1 is one of the best models for text. Use quotes: «A neon sign reading "OPEN 24/7" on a dark brick wall», «A handwritten note saying "I love you" on vintage paper». Specify font, size, color, and placement for control.
Concrete art styles outperform abstract ones: «Impressionist painting with visible brushstrokes», «Art Nouveau poster style», «1980s cyberpunk neon aesthetic», «Bauhaus minimalism». Don't mix conflicting styles in one prompt — it produces unpredictable output.
Common mistakes
1. Keyword list instead of coherent description
FLUX.1 is trained on long captions. A coherent sentence yields significantly better results than «mountain, snow, sky, blue, epic, detailed». T5-XXL reads context and word relationships — that's its main advantage.
2. Prompt weights and SD syntax
FLUX.1 does NOT support `(word:1.5)`, `word++`, embeddings, or LoRA references — all of it lands in the prompt as literal noise. Use «with emphasis on» or «with a focus on» for emphasis, and regulate priority via word order.
3. Quality boosters «masterpiece, best quality, 8K»
Unlike Stable Diffusion, these phrases barely affect FLUX.1 output. Concrete camera terms («85mm at f/1.8», «shallow DOF», «golden hour») and style references («editorial fashion photography», «Frank Frazetta concept art») work significantly better.
4. «White background» in FLUX.1 [dev]
A known dev-variant issue: the phrase «white background» produces blurry, unclear images. Describe backgrounds concretely — «a soft grey studio backdrop», «seamless paper background with soft diffused light», «neutral cream-colored backdrop». The issue is not pronounced in pro and schnell.
5. Conflicting styles in one prompt
«cyberpunk and medieval», «photorealistic watercolor painting», «minimalist detailed» — mutually exclusive directions confuse the model. If a stylistic blend is needed, describe it explicitly: «realistic photography with subtle painterly post-processing», not as two equal-weight styles.
Before / after examples
Example 1
Before
mountain landscape at dawn
After
A wide-angle view of jagged snow-capped peaks at dawn, mist swirling over the icy ridges. In the foreground, dark pine silhouettes; in the midground, a frozen lake reflecting the sky; in the background, vibrant orange-pink clouds illuminated by first light. Cinematic landscape photography, shot on 24mm lens, deep focus, dramatic warm rim light, restrained cool palette.
A layered composition (foreground/midground/background) gives the model full spatial structure. Concrete lens and lighting replace the vague «epic».
Example 2
Before
vintage bookstore sign with old typography
After
A weathered wooden shop sign hanging above a cobblestone street in 1920s Paris. The sign reads "LIBRAIRIE ANCIENNE" in elegant gold serif lettering with a curled border. Soft afternoon light catches the gilded letters. 35mm film photography, shallow depth of field, warm sepia palette, Atget-inspired documentary style.
Quotes lock the exact text. A concrete era, font, color, and photo style give the model full visual context instead of the generic «vintage».
Example 3
Before
photorealistic masterpiece, best quality, 8K, ultra HD, hyperrealistic portrait, beautiful woman
After
Editorial portrait of a woman in her thirties with freckles and short dark hair, wearing a cream-colored cashmere sweater. Soft natural light from a north-facing window, shallow depth of field, shot on 85mm lens at f/1.8. Calm contemplative expression, subtle film grain, muted warm palette, fashion editorial style.
Quality boosters barely affect FLUX. Concrete camera terms, a specific lighting type, and a style reference yield far better results than an adjective stack.