Image

FLUX Kontext: how to write prompts the model actually understands

Replicate · Updated:

FLUX Kontext is Black Forest Labs' image-to-image editing model (Pro, Max, Multi). It accepts an input image plus a change instruction. The key difference from regular text-to-image models — Kontext sees the source and there's no need to redescribe the whole scene; the prompt describes only what to change.

What FLUX Kontext does well

Kontext is a surgical editing tool: object swaps, clothing and background changes, style transfer, text editing, adding and removing elements. The model preserves everything not mentioned in the prompt, which makes it ideal for iterative work.

Variants: [pro] — high quality, [max] — maximum accuracy plus best in-image text rendering, [dev] — open source. Kontext Multi combines 2+ input images into a single generation — for example, transferring a face from one shot into a scene from another.

  • Image-to-image editing that preserves unchanged regions
  • Variants: Pro, Max, Multi (multi-image), dev
  • Top-tier in-image text rendering (especially [max])
  • Iterative editing with identity preservation
  • Multi-image compositing across 2+ sources

Key difference from text-to-image

Kontext sees the input image. This changes the prompt logic:

DON'T describe the whole scene. DO describe only what to CHANGE. The model preserves anything not mentioned.

Bad: «A woman with red hair wearing a blue dress standing in a park with autumn trees» (that's a text-to-image prompt).

Good: «Change her hair color to red» (a concrete change).

A short prompt like «Change the car to red» is normal for Kontext, not a flaw. Length is justified only for complex transformations.

Prompt structure and detail levels

Formula: [What to change] + [How to change] + [What to preserve (optional)].

Level 1 (Quick Edit): «Change the car to red» — for simple edits.

Level 2 (Controlled Edit): «Change the car to bright red while keeping everything else identical, maintain the same lighting and background» — with explicit preservation.

Level 3 (Complex Transformation): «Change the background to a beach while keeping the person in the exact same position, maintain identical subject placement, camera angle, framing, and perspective. Only replace the environment around them» — for serious changes.

Verb control and precise pointers

The verb defines the scale of change:

«change» — targeted swap. «transform» — global transformation, can alter identity. «convert» — stylistic conversion (style transfer). «add» — addition without modifying existing content. «replace» — substitution of a specific element.

To preserve a face use «change», not «transform»: «Transform the person into a Viking» may fully shift identity. «Change the clothing to Viking armor while keeping the same facial features» preserves the face.

Avoid pronouns. Instead of «she» or «the person» use descriptive phrases: «the woman with short black hair», «the car on the left».

Common mistakes

  1. 1. Full scene description instead of edit instruction

    «A woman with red hair in a blue dress standing in a park» is a text-to-image prompt, not a Kontext one. The model sees the input image; describing what's already there is unnecessary. Describe only the changes: «Change her hair to red» or «Change the dress to blue».

  2. 2. Vague instructions like «make it better»

    «Make it better», «Improve the image», «Make it more interesting» — the model doesn't know what to change specifically. State a clear action: «Change X to Y», «Add Z», «Remove W», «Convert to style S». Specificity is mandatory.

  3. 3. Verb «transform» for targeted edits

    «Transform the person into a Viking» may fully alter identity — face, build, everything. To preserve the face use «change»: «Change the clothing to Viking armor while keeping the same facial features». The verb defines the scale.

  4. 4. Too many changes in a single prompt

    «Change the background, add glasses, change hair color, add a hat, and make it cartoon style» — overload. Kontext works cleaner via a chain of 2-3 simple edits: background first, then accessories, then style. Each step preserves identity better.

  5. 5. Pronouns instead of descriptive phrases

    «Change her dress» in an image with two women is ambiguous — the model doesn't know which one to edit. Use descriptions: «the woman on the left», «the woman with dark hair», «the person in the red jacket». For text: «the sign above the door».

Before / after examples

Example 1

Before

make the image more interesting and cool with better colors

After

Change the sky from overcast grey to a vibrant sunset with orange and pink clouds. Keep the building, people, and street unchanged. Maintain the same camera angle, framing, and shadow direction on the ground.

«Make it better» — the model doesn't know what to change. A concrete X → Y instruction plus a preserve block yields predictable results.

Example 2

Before

A beautiful woman with red hair in a blue dress on a beach

After

Change the woman's dress from black to navy blue. Change the background from the original setting to a tropical beach at sunset. Keep her exact facial features, pose, body position, and hair unchanged.

The first prompt is a full from-scratch scene description, like text-to-image. Kontext sees the source; concrete edits with an explicit preserve list for face and pose are required.

Example 3

Before

change the sign

After

Replace the text on the wooden shop sign to read "LIBRARY" in elegant gold serif lettering. Maintain the original sign shape, brick wall background, lighting, and shadows. Match the existing font weight and color tone as closely as possible.

Quotes lock the new text exactly. Specifying font, fixing background and shadows makes the replacement clean and legible.

Frequently asked

What is the difference between Kontext Pro, Max, and Multi?
[Pro] — high quality for most editing tasks. [Max] — maximum accuracy plus the best in-image text rendering; use it when changing signage or editing typography. [Multi] — accepts 2+ input images in a single generation: for example, transferring a face from one shot into a scene from another. [dev] — open-source variant with lower quality, for research and non-commercial use.
Do all preserved elements need to be listed?
Not always. For simple edits Kontext preserves the unmentioned by default. But for background swaps, environment changes, or complex transformations an explicit preserve block sharply boosts stability: «while keeping the same facial features», «maintain the original composition», «keep the lighting and camera angle». Especially critical during iterative edits — otherwise the model drifts.
How can text be edited without losing the font?
Use quotes for the exact new text: «Replace 'OPEN' with 'CLOSED'». Add «maintain the same font style and color» and try to keep roughly similar text length — that holds the layout. For complex fonts or rare typography use Kontext Max, which has the best text rendering among the variants.
Can Kontext be used as text-to-image without an input image?
Technically yes, but it's not the primary mode. Without an input image Kontext works as a regular text-to-image, but quality is lower than FLUX.1 [pro] and [1.1 pro] Ultra, which specialize in T2I. For from-scratch generation use FLUX.1; for editing use Kontext.
How do you do complex transformations without losing identity?
Break them into a chain of 2-3 simple edits. Don't try «change background + add glasses + change hair + make it cartoon» in one prompt — Kontext gets confused. Better: (1) change the background, (2) add accessories, (3) apply the style. Each step takes the previous result as input, which preserves face and pose across the chain.
Does Kontext support SD syntax?
No. Weights like `(word:1.5)`, `word++`, embeddings, LoRA references don't work and land in the prompt as literal noise. Regulate priorities via word order (important content first in the instruction) and explicit phrasing like «with emphasis on», «focus on». This family is built on T5-XXL, not the SD stack.
Does Opten support FLUX Kontext?
Yes, the Opten extension auto-detects FLUX Kontext and scores prompts against the edit structure outlined above: it checks for a concrete action, an explicit preserve block on complex edits, the right verb (change vs transform), use of quotes for text, and descriptive phrases instead of pronouns. One click delivers a rewrite in the correct structure.

Related models

Ready to write FLUX Kontext Pro / Max / Multi prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672