FLUX: how to write prompts the model actually understands
Replicate · Updated:
FLUX is a family of image models from Black Forest Labs (schnell, dev, pro, 1.1 pro Ultra, FLUX.2, Kontext). Its dual CLIP + T5-XXL text encoder understands long coherent descriptions better than most competitors. Optimal prompt length is 50–200 English words, structured as subject, scene, lighting, style, and camera parameters.
What FLUX does well
FLUX's core strength is natural descriptive language. The model was trained on long captions and interprets coherent prose noticeably better than single-CLIP models. This pays off in portraits, layered landscape composition, and product photography.
The model handles photo vocabulary cleanly: lenses, aperture, depth of field, lighting types. It supports in-image text via quotes, a wide stylistic range (from photorealism to watercolor and concept art), and multi-layered scenes describing foreground, midground, and background.
- Dual CLIP + T5-XXL encoder — understands long descriptions
- Up to ~2000 prompt tokens, optimal 50–200 words
- Multiple variants: schnell (fast), dev (open-source), pro (max quality)
- In-image text via quotes
- Photo-grade vocabulary: 85mm, f/2.8, shallow DOF, golden hour
Prompt structure
Optimal order: [Subject] + [Appearance details] + [Scene/Background] + [Lighting] + [Style/Mood] + [Camera/Technique] + [Color palette].
Write coherent sentences, not keyword lists. «A close-up portrait of a middle-aged man with a thick dark beard, wearing a leather jacket, standing in front of an urban graffiti wall, soft sunlight casting shadows on his face, documentary photography, shot on 85mm lens, warm muted tones» beats «man, beard, leather jacket, graffiti, warm tones» significantly.
Use layered scene descriptions — foreground, midground, background. This gives the model a clear spatial structure to work with.
Lighting and camera
FLUX handles photo terminology well. Specify concrete lighting: «golden hour with warm tones», «soft morning light casting long shadows», «studio three-point lighting», «neon glow with cyberpunk palette». Generic phrases like «good lighting» yield weak results.
For photorealism use camera parameters: «shot with a 50mm lens at f/2.8, shallow depth of field, blurred background», «wide-angle 24mm, deep focus», «macro photography, extreme close-up». Concrete technical terms work far better than generic «8K, ultra HD, hyperrealistic».
Style and artistic references
Concrete style references beat abstract ones. «In the style of 1980s cyberpunk» outperforms «futuristic». «Inspired by the surrealism of Salvador Dali», «impressionist painting with visible brushstrokes», «editorial fashion photography» — the model recognizes genres, eras, and techniques.
FLUX supports a wide stylistic range: photorealism, oil painting, watercolor, concept art, anime, vector illustration. Don't mix conflicting styles in one prompt — «cyberpunk and medieval» or «photorealistic watercolor» produce unpredictable output. If a stylistic blend is needed, describe it explicitly.
Common mistakes
1. Keyword list instead of coherent description
FLUX is trained on long captions and processes coherent English better than any comma-separated lists. «dragon, fire, sky, sunset, mountains, epic» loses to «A red dragon breathes fire into a sunset sky over distant mountains, dramatic backlit silhouette». Write sentences.
2. Using Stable Diffusion syntax
Weights like `(keyword:1.5)`, `word++`, embeddings, or LoRA references don't work in FLUX and land in the prompt as literal text. Regulate priorities by word order — important content first, secondary at the end. For emphasis, use «with emphasis on» or «with a focus on».
3. Quality boosters without specifics
«masterpiece, best quality, 8K, ultra HD, hyperrealistic» strung together is a typical SD pattern that barely influences FLUX output. Concrete camera terms («85mm at f/1.8», «shallow DOF», «golden hour») and style references («editorial fashion photography») work many times better.
4. Conflicting styles in one prompt
«cyberpunk and medieval», «photorealistic watercolor», «minimalist detailed» — mutually exclusive directions confuse the model and produce unpredictable output. If a stylistic blend is needed, describe it explicitly: «realistic photography with subtle painterly post-processing».
5. «White background» in FLUX.1 [dev]
A dev-specific issue: the phrase «white background» causes blurry, unclear images. Use a concrete background description — «a soft grey studio backdrop», «seamless paper background, soft diffused light», «neutral cream-colored backdrop». Less pronounced in pro and schnell, but specificity still helps.
Before / after examples
Example 1
Before
beautiful girl on the beach
After
A young woman with sun-streaked auburn hair in a flowing white linen dress, standing on a Pacific Northwest beach at golden hour. Wind catches her hair, soft warm light skims her shoulders. Shallow depth of field, shot on 85mm lens at f/1.8, blurred ocean in background, muted coastal palette, editorial fashion photography.
Key change: coherent sentence instead of a keyword list, concrete subject, layered scene (foreground/background), camera vocabulary, specific lighting.
Example 2
Before
fantasy landscape, mountains, dragon, epic, 8k, masterpiece
After
A wide-angle view of jagged snow-capped peaks at dawn, mist swirling over the icy ridges. In the foreground, a lone red dragon perches on a stone outcrop looking toward the horizon. Vibrant orange-pink sky behind the range, dramatic warm rim light, cinematic concept art style inspired by Frank Frazetta, deep focus.
Comma-separated keywords work worse than coherent description. Quality boosters replaced with a concrete style and an artistic reference, plus a layered composition.
Example 3
Before
vintage diner sign
After
A weathered neon sign on a brick wall above a roadside diner at twilight. The sign reads "JOE'S DINER" in bold red script letters with cyan accents, some bulbs flickering. Wet asphalt below reflects the neon glow. 35mm film photography, shallow DOF, moody desaturated palette, Edward Hopper atmosphere.
Text in quotes is required for legible rendering. Concrete font, color, era, and atmosphere give the model the full visual context.