FLUX Kontext: how to write prompts the model actually understands
Replicate · Updated:
FLUX Kontext is Black Forest Labs' image-to-image editing model (Pro, Max, Multi). It accepts an input image plus a change instruction. The key difference from regular text-to-image models — Kontext sees the source and there's no need to redescribe the whole scene; the prompt describes only what to change.
What FLUX Kontext does well
Kontext is a surgical editing tool: object swaps, clothing and background changes, style transfer, text editing, adding and removing elements. The model preserves everything not mentioned in the prompt, which makes it ideal for iterative work.
Variants: [pro] — high quality, [max] — maximum accuracy plus best in-image text rendering, [dev] — open source. Kontext Multi combines 2+ input images into a single generation — for example, transferring a face from one shot into a scene from another.
- Image-to-image editing that preserves unchanged regions
- Variants: Pro, Max, Multi (multi-image), dev
- Top-tier in-image text rendering (especially [max])
- Iterative editing with identity preservation
- Multi-image compositing across 2+ sources
Key difference from text-to-image
Kontext sees the input image. This changes the prompt logic:
DON'T describe the whole scene. DO describe only what to CHANGE. The model preserves anything not mentioned.
Bad: «A woman with red hair wearing a blue dress standing in a park with autumn trees» (that's a text-to-image prompt).
Good: «Change her hair color to red» (a concrete change).
A short prompt like «Change the car to red» is normal for Kontext, not a flaw. Length is justified only for complex transformations.
Prompt structure and detail levels
Formula: [What to change] + [How to change] + [What to preserve (optional)].
Level 1 (Quick Edit): «Change the car to red» — for simple edits.
Level 2 (Controlled Edit): «Change the car to bright red while keeping everything else identical, maintain the same lighting and background» — with explicit preservation.
Level 3 (Complex Transformation): «Change the background to a beach while keeping the person in the exact same position, maintain identical subject placement, camera angle, framing, and perspective. Only replace the environment around them» — for serious changes.
Verb control and precise pointers
The verb defines the scale of change:
«change» — targeted swap. «transform» — global transformation, can alter identity. «convert» — stylistic conversion (style transfer). «add» — addition without modifying existing content. «replace» — substitution of a specific element.
To preserve a face use «change», not «transform»: «Transform the person into a Viking» may fully shift identity. «Change the clothing to Viking armor while keeping the same facial features» preserves the face.
Avoid pronouns. Instead of «she» or «the person» use descriptive phrases: «the woman with short black hair», «the car on the left».
Common mistakes
1. Full scene description instead of edit instruction
«A woman with red hair in a blue dress standing in a park» is a text-to-image prompt, not a Kontext one. The model sees the input image; describing what's already there is unnecessary. Describe only the changes: «Change her hair to red» or «Change the dress to blue».
2. Vague instructions like «make it better»
«Make it better», «Improve the image», «Make it more interesting» — the model doesn't know what to change specifically. State a clear action: «Change X to Y», «Add Z», «Remove W», «Convert to style S». Specificity is mandatory.
3. Verb «transform» for targeted edits
«Transform the person into a Viking» may fully alter identity — face, build, everything. To preserve the face use «change»: «Change the clothing to Viking armor while keeping the same facial features». The verb defines the scale.
4. Too many changes in a single prompt
«Change the background, add glasses, change hair color, add a hat, and make it cartoon style» — overload. Kontext works cleaner via a chain of 2-3 simple edits: background first, then accessories, then style. Each step preserves identity better.
5. Pronouns instead of descriptive phrases
«Change her dress» in an image with two women is ambiguous — the model doesn't know which one to edit. Use descriptions: «the woman on the left», «the woman with dark hair», «the person in the red jacket». For text: «the sign above the door».
Before / after examples
Example 1
Before
make the image more interesting and cool with better colors
After
Change the sky from overcast grey to a vibrant sunset with orange and pink clouds. Keep the building, people, and street unchanged. Maintain the same camera angle, framing, and shadow direction on the ground.
«Make it better» — the model doesn't know what to change. A concrete X → Y instruction plus a preserve block yields predictable results.
Example 2
Before
A beautiful woman with red hair in a blue dress on a beach
After
Change the woman's dress from black to navy blue. Change the background from the original setting to a tropical beach at sunset. Keep her exact facial features, pose, body position, and hair unchanged.
The first prompt is a full from-scratch scene description, like text-to-image. Kontext sees the source; concrete edits with an explicit preserve list for face and pose are required.
Example 3
Before
change the sign
After
Replace the text on the wooden shop sign to read "LIBRARY" in elegant gold serif lettering. Maintain the original sign shape, brick wall background, lighting, and shadows. Match the existing font weight and color tone as closely as possible.
Quotes lock the new text exactly. Specifying font, fixing background and shadows makes the replacement clean and legible.