Which Wan versions exist?

Current versions are Wan 2.5 and Wan 2.6 from the Wan-AI / Alibaba team. These are open-source models available through fal.ai and Replicate API platforms, and for local execution. At the prompting level the approach is consistent across versions: natural language, subject first, specifics over abstractions. Quality and detail improve from version to version.

What is the optimal prompt size for Wan?

The recommended range is 30-100 tokens (roughly 25-80 words). That gives room for subject, environment, lighting, and style without overloading. The model's hard limit is around 500 tokens, but prompts beyond 100 start losing focus: details conflict and the model mixes priorities. For very detailed work consider a different model.

Can I write prompts in Russian or Chinese?

English gives the most stable results, especially for photographic and stylistic terminology. Chinese is also natively supported — Wan was trained on bilingual datasets. Russian technically works, but quality is lower: some descriptive constructions and terms are interpreted less precisely. For production work English is recommended.

How do I use Image-to-Image mode?

Upload a source image and write a prompt that describes the DESIRED RESULT. The main parameter is strength (denoising strength): low (0.2-0.4) — minimal edits, medium (0.5-0.7) — changes to style and color, high (0.8+) — near-complete reinterpretation. The prompt should NOT describe the contents of the source photo — focus on the transformation.

Does Wan support negative prompts?

Yes, through platform settings (fal.ai, Replicate, local execution) — most interfaces expose a separate negative prompt field. Put there what should NOT appear in the image: «watermark, text, blurry, low quality, deformed». Do not use negative phrasing inside the main prompt — it works worse than a dedicated field.

Why do results look less polished than Midjourney?

Wan is an open model with a smaller training base and without the specialized post-training optimizations that make Midjourney «pretty by default». Wan gives you more control and flexibility (local execution, LoRA fine-tuning, ControlNet), but demands more precise prompts. Do not lean on abstract «beautiful» — describe concrete light, optics, and palette parameters.

Does Opten support Wan?

Yes, the Opten extension detects Wan on fal.ai and Replicate and scores prompts against the structure outlined above: it checks for a concrete subject up front, natural language instead of tag lists, concrete lighting and optics, and the absence of quality spam and Midjourney syntax. One click gives you a rewrite in the right structure.

Image

Wan: how to write prompts the model actually understands

Name: Wan (General — 2.5 / 2.6)
Brand: Wan

Wan · Updated: May 19, 2026

Wan is Alibaba's open-source T2I model, available via fal.ai, Replicate, and for local execution. It accepts natural language prompts with concrete subject, environment, lighting, and camera details. English gives the most stable results, Chinese is also supported. Recommended prompt length is 30-100 tokens.

What Wan does

Wan generates images in two modes: Text-to-Image (T2I) and Image-to-Image (I2I). Current versions are Wan 2.5 and Wan 2.6. Maximum resolution depends on the platform, typically up to 1024×1024. The model is open-source — you can run it locally on consumer GPUs with sufficient VRAM, or use it through API platforms like fal.ai and Replicate.

Because of the open nature there are no Midjourney-style proprietary parameters (`--ar`, `--s`, `::weight`) — everything is controlled by prompt text plus platform settings (resolution, seed, steps, guidance scale, strength for I2I). The model's prompt limit is around 500 tokens, but the sweet spot for stable results is 30-100 tokens. Beyond that, details start to conflict and the model loses focus.

Text-to-Image and Image-to-Image modes
Wan 2.5 and Wan 2.6 versions, open-source model
Platforms: fal.ai, Replicate, local execution
Optimal prompt: 30-100 tokens, limit ~500
Parameters via platform settings, not flags

Prompt structure

Optimal order: [Subject] + [Subject details] + [Context/Environment] + [Style/Mood] + [Lighting] + [Composition/Camera].

The key principle is natural language with concrete details. Wan understands coherent descriptive prompts well; it handles chaotic comma-separated tag lists less well. The crucial point: the subject always comes first. «A young woman in a flowing white dress standing on a rocky cliff» — the model builds the figure first, then layers the environment. «Beautiful cinematic photo of...» — the model latches onto «cinematic photo» as the style first, and the subject becomes secondary.

Example of a strong prompt: «A young woman in a flowing white dress standing on a rocky cliff overlooking the ocean at sunset, wind blowing her hair, warm golden light, cinematic composition, photorealistic, 85mm lens, shallow depth of field». Subject → environment → lighting → style → optics — each block adds information without conflicting with the others.

Lighting, camera, style

Set lighting through type and direction, not brightness: • Natural: golden hour, natural sunlight, soft daylight, overcast. • Studio: studio lighting, softbox, Rembrandt lighting. • Dramatic: dramatic lighting, rim light, backlight, chiaroscuro. • Atmospheric: volumetric light, fog, god rays, haze. • Neon: neon glow, neon reflections, cyberpunk lighting.

Camera and optics — Wan understands photographic terms: • Lenses: 85mm, 35mm lens, wide-angle, macro, telephoto. • Angle: bird's eye view, low angle, Dutch angle, eye level, worm's eye. • Shot size: extreme close-up, close-up, medium shot, wide shot, full body. • Depth: shallow depth of field, bokeh, tilt-shift, deep focus.

Art styles — photorealistic, hyperrealistic, editorial photography, RAW photo, oil painting, watercolor, impressionist, digital painting, vector art, flat design, minimalist, pixel art, 3D render, CGI, unreal engine, octane render, cinematic, film still, anime style, manga, cel shading.

Image-to-Image: control via strength

In I2I mode the model uses the input image as a starting point, and the prompt describes the desired result. The main parameter is strength (or denoising strength) — it controls how much the prompt influences the output versus the source. Low strength (0.2-0.4) — minimal edits, composition and most details are preserved. Medium (0.5-0.7) — noticeable changes to style, lighting, color while keeping the structure. High (0.8+) — near-complete reinterpretation, the prompt becomes the primary source.

Key rule: the prompt describes the DESIRED RESULT, not the input image. «A painted portrait in oil painting style, dramatic side lighting, warm tones» — the model applies those transformations to the input photo. Describing what is already in the photo has no effect. For radical changes raise strength, for subtle correction lower it.

Common mistakes

1. Spamming quality keywords
«Beautiful, stunning, 8k, detailed, masterpiece, best quality, ultra HD, award winning» — clutters the prompt without real benefit. These words are statistically meaningless to Wan and conflict with each other. Replace with specifics: «sharp focus, fine detail visible, natural texture, hyperrealistic, editorial photography». Concrete parameters work.
2. Style or adjectives at the start instead of subject
«Beautiful cinematic photo of a woman» — the model latches onto «beautiful cinematic photo» as the style first, and the subject becomes secondary. The right pattern: «A young woman with auburn hair... beautiful cinematic photo style». The most important thing — subject with details — should appear in the first sentence of the prompt.
3. Chaotic comma-separated tag lists
«woman, red dress, sunset, beach, ocean, sand, beautiful, photo, cinematic, 4k, detail» — Wan handles incoherent lists worse than natural descriptive sentences. Replace with coherent text: «A woman in a red dress walking along a sandy beach at sunset, ocean waves behind her. Cinematic photography, warm tones.».
4. Conflicting instructions
«Dark and moody, bright and cheerful, cool blue tones, warm golden light» — the model cannot honor a contradiction and either ignores part of it or produces a mixed result with breakdowns. Pick one mood direction and stick with it. If you need different moods, generate separate images.
5. Midjourney or SD syntax
Parameters like `--ar 16:9`, `--style raw`, weights `(beautiful:1.5)`, `::weight` — do not work in Wan and end up in the prompt as literal text. Set size in platform settings, weight words by order (important first), set styles via normal adjectives in natural language.

Before / after examples

Example 1

Before

a beautiful landscape

After

A majestic snow-covered mountain peak under a sky of swirling aurora borealis, deep purple and emerald green light bands above. Foreground: a solitary pine tree on a frozen lake reflecting the colors. Wide-angle landscape composition, low angle looking up. Cinematic lighting, hyperrealistic, sharp focus, fine detail visible in the ice and snow texture. 24mm lens, deep focus.

Concrete subject, explicit foreground, color description instead of «beautiful», camera and optics, style via «hyperrealistic» rather than «8k masterpiece».

Example 2

Before

elderly man portrait

After

An elderly fisherman with deeply weathered skin, a thick white beard, and piercing blue eyes, wearing a worn navy wool sweater. He sits on a wooden bench, hands folded in his lap. Soft window light from screen-left creating Rembrandt lighting on his face, warm golden tones. Background: out-of-focus harbor with fishing boats. Editorial portrait photography, 85mm lens, shallow depth of field, photorealistic.

Concrete appearance details, explicit lighting setup with direction (Rembrandt lighting), optics, stylistic reference «editorial portrait photography».

Example 3

Before

futuristic city at night

After

A neon-soaked cyberpunk Tokyo street at midnight, rain-soaked asphalt reflecting magenta and cyan signs, holographic advertisements floating above traffic. Crowds of people in dark clothing crossing under giant LED screens. Wide-angle low-angle shot looking up between skyscrapers. Cyberpunk lighting with strong neon glow, deep shadows, volumetric haze. Cinematic, film still, sharp focus on the foreground signs, soft bokeh on background lights. 35mm lens, dramatic perspective.

Specific setting and time, color anchors via color names, layered atmosphere, explicit lighting and optics. «Cyberpunk» works as a style without quality spam.

Wan: how to write prompts the model actually understands

What Wan does

Prompt structure

Lighting, camera, style

Image-to-Image: control via strength

Common mistakes

1. Spamming quality keywords

2. Style or adjectives at the start instead of subject

3. Chaotic comma-separated tag lists

4. Conflicting instructions

5. Midjourney or SD syntax

Before / after examples

Frequently asked

Related models

Z-Image (Base / Turbo)

Seedream 5 Lite

Seedream 4.5

Ready to write Wan (General — 2.5 / 2.6) prompts in one click?