Wan: how to write prompts the model actually understands
Wan · Updated:
Wan is Alibaba's open-source T2I model, available via fal.ai, Replicate, and for local execution. It accepts natural language prompts with concrete subject, environment, lighting, and camera details. English gives the most stable results, Chinese is also supported. Recommended prompt length is 30-100 tokens.
What Wan does
Wan generates images in two modes: Text-to-Image (T2I) and Image-to-Image (I2I). Current versions are Wan 2.5 and Wan 2.6. Maximum resolution depends on the platform, typically up to 1024×1024. The model is open-source — you can run it locally on consumer GPUs with sufficient VRAM, or use it through API platforms like fal.ai and Replicate.
Because of the open nature there are no Midjourney-style proprietary parameters (`--ar`, `--s`, `::weight`) — everything is controlled by prompt text plus platform settings (resolution, seed, steps, guidance scale, strength for I2I). The model's prompt limit is around 500 tokens, but the sweet spot for stable results is 30-100 tokens. Beyond that, details start to conflict and the model loses focus.
- Text-to-Image and Image-to-Image modes
- Wan 2.5 and Wan 2.6 versions, open-source model
- Platforms: fal.ai, Replicate, local execution
- Optimal prompt: 30-100 tokens, limit ~500
- Parameters via platform settings, not flags
Prompt structure
Optimal order: [Subject] + [Subject details] + [Context/Environment] + [Style/Mood] + [Lighting] + [Composition/Camera].
The key principle is natural language with concrete details. Wan understands coherent descriptive prompts well; it handles chaotic comma-separated tag lists less well. The crucial point: the subject always comes first. «A young woman in a flowing white dress standing on a rocky cliff» — the model builds the figure first, then layers the environment. «Beautiful cinematic photo of...» — the model latches onto «cinematic photo» as the style first, and the subject becomes secondary.
Example of a strong prompt: «A young woman in a flowing white dress standing on a rocky cliff overlooking the ocean at sunset, wind blowing her hair, warm golden light, cinematic composition, photorealistic, 85mm lens, shallow depth of field». Subject → environment → lighting → style → optics — each block adds information without conflicting with the others.
Lighting, camera, style
Set lighting through type and direction, not brightness: • Natural: golden hour, natural sunlight, soft daylight, overcast. • Studio: studio lighting, softbox, Rembrandt lighting. • Dramatic: dramatic lighting, rim light, backlight, chiaroscuro. • Atmospheric: volumetric light, fog, god rays, haze. • Neon: neon glow, neon reflections, cyberpunk lighting.
Camera and optics — Wan understands photographic terms: • Lenses: 85mm, 35mm lens, wide-angle, macro, telephoto. • Angle: bird's eye view, low angle, Dutch angle, eye level, worm's eye. • Shot size: extreme close-up, close-up, medium shot, wide shot, full body. • Depth: shallow depth of field, bokeh, tilt-shift, deep focus.
Art styles — photorealistic, hyperrealistic, editorial photography, RAW photo, oil painting, watercolor, impressionist, digital painting, vector art, flat design, minimalist, pixel art, 3D render, CGI, unreal engine, octane render, cinematic, film still, anime style, manga, cel shading.
Image-to-Image: control via strength
In I2I mode the model uses the input image as a starting point, and the prompt describes the desired result. The main parameter is strength (or denoising strength) — it controls how much the prompt influences the output versus the source. Low strength (0.2-0.4) — minimal edits, composition and most details are preserved. Medium (0.5-0.7) — noticeable changes to style, lighting, color while keeping the structure. High (0.8+) — near-complete reinterpretation, the prompt becomes the primary source.
Key rule: the prompt describes the DESIRED RESULT, not the input image. «A painted portrait in oil painting style, dramatic side lighting, warm tones» — the model applies those transformations to the input photo. Describing what is already in the photo has no effect. For radical changes raise strength, for subtle correction lower it.
Common mistakes
1. Spamming quality keywords
«Beautiful, stunning, 8k, detailed, masterpiece, best quality, ultra HD, award winning» — clutters the prompt without real benefit. These words are statistically meaningless to Wan and conflict with each other. Replace with specifics: «sharp focus, fine detail visible, natural texture, hyperrealistic, editorial photography». Concrete parameters work.
2. Style or adjectives at the start instead of subject
«Beautiful cinematic photo of a woman» — the model latches onto «beautiful cinematic photo» as the style first, and the subject becomes secondary. The right pattern: «A young woman with auburn hair... beautiful cinematic photo style». The most important thing — subject with details — should appear in the first sentence of the prompt.
3. Chaotic comma-separated tag lists
«woman, red dress, sunset, beach, ocean, sand, beautiful, photo, cinematic, 4k, detail» — Wan handles incoherent lists worse than natural descriptive sentences. Replace with coherent text: «A woman in a red dress walking along a sandy beach at sunset, ocean waves behind her. Cinematic photography, warm tones.».
4. Conflicting instructions
«Dark and moody, bright and cheerful, cool blue tones, warm golden light» — the model cannot honor a contradiction and either ignores part of it or produces a mixed result with breakdowns. Pick one mood direction and stick with it. If you need different moods, generate separate images.
5. Midjourney or SD syntax
Parameters like `--ar 16:9`, `--style raw`, weights `(beautiful:1.5)`, `::weight` — do not work in Wan and end up in the prompt as literal text. Set size in platform settings, weight words by order (important first), set styles via normal adjectives in natural language.
Before / after examples
Example 1
Before
a beautiful landscape
After
A majestic snow-covered mountain peak under a sky of swirling aurora borealis, deep purple and emerald green light bands above. Foreground: a solitary pine tree on a frozen lake reflecting the colors. Wide-angle landscape composition, low angle looking up. Cinematic lighting, hyperrealistic, sharp focus, fine detail visible in the ice and snow texture. 24mm lens, deep focus.
Concrete subject, explicit foreground, color description instead of «beautiful», camera and optics, style via «hyperrealistic» rather than «8k masterpiece».
Example 2
Before
elderly man portrait
After
An elderly fisherman with deeply weathered skin, a thick white beard, and piercing blue eyes, wearing a worn navy wool sweater. He sits on a wooden bench, hands folded in his lap. Soft window light from screen-left creating Rembrandt lighting on his face, warm golden tones. Background: out-of-focus harbor with fishing boats. Editorial portrait photography, 85mm lens, shallow depth of field, photorealistic.
Concrete appearance details, explicit lighting setup with direction (Rembrandt lighting), optics, stylistic reference «editorial portrait photography».
Example 3
Before
futuristic city at night
After
A neon-soaked cyberpunk Tokyo street at midnight, rain-soaked asphalt reflecting magenta and cyan signs, holographic advertisements floating above traffic. Crowds of people in dark clothing crossing under giant LED screens. Wide-angle low-angle shot looking up between skyscrapers. Cyberpunk lighting with strong neon glow, deep shadows, volumetric haze. Cinematic, film still, sharp focus on the foreground signs, soft bokeh on background lights. 35mm lens, dramatic perspective.
Specific setting and time, color anchors via color names, layered atmosphere, explicit lighting and optics. «Cyberpunk» works as a style without quality spam.