Image

FLUX: how to write prompts the model actually understands

Replicate · Updated:

FLUX is a family of image models from Black Forest Labs (schnell, dev, pro, 1.1 pro Ultra, FLUX.2, Kontext). Its dual CLIP + T5-XXL text encoder understands long coherent descriptions better than most competitors. Optimal prompt length is 50–200 English words, structured as subject, scene, lighting, style, and camera parameters.

What FLUX does well

FLUX's core strength is natural descriptive language. The model was trained on long captions and interprets coherent prose noticeably better than single-CLIP models. This pays off in portraits, layered landscape composition, and product photography.

The model handles photo vocabulary cleanly: lenses, aperture, depth of field, lighting types. It supports in-image text via quotes, a wide stylistic range (from photorealism to watercolor and concept art), and multi-layered scenes describing foreground, midground, and background.

  • Dual CLIP + T5-XXL encoder — understands long descriptions
  • Up to ~2000 prompt tokens, optimal 50–200 words
  • Multiple variants: schnell (fast), dev (open-source), pro (max quality)
  • In-image text via quotes
  • Photo-grade vocabulary: 85mm, f/2.8, shallow DOF, golden hour

Prompt structure

Optimal order: [Subject] + [Appearance details] + [Scene/Background] + [Lighting] + [Style/Mood] + [Camera/Technique] + [Color palette].

Write coherent sentences, not keyword lists. «A close-up portrait of a middle-aged man with a thick dark beard, wearing a leather jacket, standing in front of an urban graffiti wall, soft sunlight casting shadows on his face, documentary photography, shot on 85mm lens, warm muted tones» beats «man, beard, leather jacket, graffiti, warm tones» significantly.

Use layered scene descriptions — foreground, midground, background. This gives the model a clear spatial structure to work with.

Lighting and camera

FLUX handles photo terminology well. Specify concrete lighting: «golden hour with warm tones», «soft morning light casting long shadows», «studio three-point lighting», «neon glow with cyberpunk palette». Generic phrases like «good lighting» yield weak results.

For photorealism use camera parameters: «shot with a 50mm lens at f/2.8, shallow depth of field, blurred background», «wide-angle 24mm, deep focus», «macro photography, extreme close-up». Concrete technical terms work far better than generic «8K, ultra HD, hyperrealistic».

Style and artistic references

Concrete style references beat abstract ones. «In the style of 1980s cyberpunk» outperforms «futuristic». «Inspired by the surrealism of Salvador Dali», «impressionist painting with visible brushstrokes», «editorial fashion photography» — the model recognizes genres, eras, and techniques.

FLUX supports a wide stylistic range: photorealism, oil painting, watercolor, concept art, anime, vector illustration. Don't mix conflicting styles in one prompt — «cyberpunk and medieval» or «photorealistic watercolor» produce unpredictable output. If a stylistic blend is needed, describe it explicitly.

Common mistakes

  1. 1. Keyword list instead of coherent description

    FLUX is trained on long captions and processes coherent English better than any comma-separated lists. «dragon, fire, sky, sunset, mountains, epic» loses to «A red dragon breathes fire into a sunset sky over distant mountains, dramatic backlit silhouette». Write sentences.

  2. 2. Using Stable Diffusion syntax

    Weights like `(keyword:1.5)`, `word++`, embeddings, or LoRA references don't work in FLUX and land in the prompt as literal text. Regulate priorities by word order — important content first, secondary at the end. For emphasis, use «with emphasis on» or «with a focus on».

  3. 3. Quality boosters without specifics

    «masterpiece, best quality, 8K, ultra HD, hyperrealistic» strung together is a typical SD pattern that barely influences FLUX output. Concrete camera terms («85mm at f/1.8», «shallow DOF», «golden hour») and style references («editorial fashion photography») work many times better.

  4. 4. Conflicting styles in one prompt

    «cyberpunk and medieval», «photorealistic watercolor», «minimalist detailed» — mutually exclusive directions confuse the model and produce unpredictable output. If a stylistic blend is needed, describe it explicitly: «realistic photography with subtle painterly post-processing».

  5. 5. «White background» in FLUX.1 [dev]

    A dev-specific issue: the phrase «white background» causes blurry, unclear images. Use a concrete background description — «a soft grey studio backdrop», «seamless paper background, soft diffused light», «neutral cream-colored backdrop». Less pronounced in pro and schnell, but specificity still helps.

Before / after examples

Example 1

Before

beautiful girl on the beach

After

A young woman with sun-streaked auburn hair in a flowing white linen dress, standing on a Pacific Northwest beach at golden hour. Wind catches her hair, soft warm light skims her shoulders. Shallow depth of field, shot on 85mm lens at f/1.8, blurred ocean in background, muted coastal palette, editorial fashion photography.

Key change: coherent sentence instead of a keyword list, concrete subject, layered scene (foreground/background), camera vocabulary, specific lighting.

Example 2

Before

fantasy landscape, mountains, dragon, epic, 8k, masterpiece

After

A wide-angle view of jagged snow-capped peaks at dawn, mist swirling over the icy ridges. In the foreground, a lone red dragon perches on a stone outcrop looking toward the horizon. Vibrant orange-pink sky behind the range, dramatic warm rim light, cinematic concept art style inspired by Frank Frazetta, deep focus.

Comma-separated keywords work worse than coherent description. Quality boosters replaced with a concrete style and an artistic reference, plus a layered composition.

Example 3

Before

vintage diner sign

After

A weathered neon sign on a brick wall above a roadside diner at twilight. The sign reads "JOE'S DINER" in bold red script letters with cyan accents, some bulbs flickering. Wet asphalt below reflects the neon glow. 35mm film photography, shallow DOF, moody desaturated palette, Edward Hopper atmosphere.

Text in quotes is required for legible rendering. Concrete font, color, era, and atmosphere give the model the full visual context.

Frequently asked

What is the difference between schnell, dev, pro, and 1.1 pro Ultra?
[schnell] is the fast 4-step model under Apache 2.0, ideal for prototyping and quick tests. [dev] is open source with 20+ steps, good quality for development and non-commercial use. [pro] is the highest quality via API with a commercial license. [1.1 pro] Ultra supports up to 2752×2752 resolution and Raw mode for large print and highly detailed images.
Can prompts be written in languages other than English?
Technically yes, but quality drops noticeably. FLUX is optimized for English — the dual CLIP + T5-XXL encoder was trained predominantly on English data. Use English for production tasks. For experiments in other languages output will work, but be less accurate in details, styles, and scene interpretation.
How does FLUX render in-image text?
Very well — FLUX.1 [pro] and [max] are among the best in this category. Use quotes for exact text: «A neon sign reading "OPEN 24/7" on a dark brick wall». Specify font, size, color, and placement. For brands and rare words the model handles letters cleaner than most competitors, though long unquoted text can be mangled.
What is the optimal prompt length?
50–200 words is the sweet spot. Under 10 words means the model invents too much and output becomes unpredictable. Over 200 words and priorities are lost — small details lose out to composition. The technical T5-XXL limit is roughly 2000 tokens, but quality degrades well before that. A dense 100-word prompt beats a sprawling 300-word one.
Does FLUX support negative prompts?
Partially. Exclusions can be appended — «no text, no watermarks, no blur» or «without people, no human figures» at the end of the prompt. But negative formulations without a positive alternative are less effective than describing what you want explicitly. «Clean composition without distractions» beats «no clutter, no extra objects, no mess».
How is FLUX different from Stable Diffusion and Midjourney?
Unlike SD, FLUX uses a dual encoder (CLIP + T5-XXL) and understands long coherent descriptions rather than comma-separated tags. Unlike Midjourney, FLUX has no `--ar 16:9` or `--stylize` parameters — aspect ratio is set via platform parameters. Quality-wise FLUX [pro] and [1.1 pro] Ultra are comparable to top-tier competitors, especially in photorealism and text rendering.
Does Opten support FLUX?
Yes, the Opten extension auto-detects FLUX on supported platforms and scores prompts against the structure outlined above: it checks for coherent descriptions instead of keyword lists, concrete lighting, camera terms for photorealism, and absence of SD syntax. One click delivers a rewrite in the correct structure.

Related models

Ready to write FLUX (General) prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672