Seedance 2.0: how to write prompts the model actually understands
ByteDance · Updated:
Seedance 2.0 is ByteDance's flagship video model on the 即梦 (Jimeng) platform. It generates 4–15 second clips up to 2K, accepts up to 9 images, 3 videos, and 3 audio inputs per request. It understands @-references, second-by-second storyboarding, and named TRY CGI blocks. On syntx.ai the standard is English; on the native platform Chinese performs better.
What is new in Seedance 2.0
Compared to 1.0 Pro and 1.5 Pro this is a generational jump. Duration is no longer fixed at 5 or 10 seconds — anywhere from 4 to 15 is open. Full multimodality landed: up to 12 files per request across images, videos, and audio. Consistency Control via @-references, voice cloning and sound design, video extension through @Video, and second-precise storyboarding all became available.
The key architectural shift: the model internally routes named blocks (LOCATION, STYLE, STORY, CHARACTERS, SHOT STRUCTURE) into different subsystems — environment, identity, temporal planner. A wall of text in a single paragraph yields noticeably worse output than the same text split across explicit blocks.
- Duration 4–15 seconds (vs 5/10 in 1.0/1.5)
- Up to 9 images + 3 videos + 3 audio per request
- Full Consistency Control via @image, @video, @audio
- Second-by-second storyboarding (0–4s / 4–10s / 10–15s)
- Voice cloning and sound design
TRY CGI prompt structure
Canonical block order for cinematic results: [TITLE & ACT] → LOCATION → REFERENCE ASSIGNMENT → STYLE → STORY → CHARACTERS → SHOT STRUCTURE. One blank line between blocks, a space after every colon.
LOCATION — environment, lighting, weather, key background details. STYLE — visual preset ("Ultra-photorealistic 4K live-action cinema", "Gritty film grain"). STORY — what happens in this specific generation, in one or two sentences. CHARACTERS — participants, their current mood, current appearance. SHOT STRUCTURE — the act-based breakdown.
This works stronger than the basic 6-step formula because named blocks route into the right generation layers.
@-references and identity preservation
Seedance 2.0 accepts references via an `@`-prefix bound to a role: @image1/@image2/@image3 for characters and scenes, @video1/@video2/@video3 for camera and rhythm cloning, @audio1/@audio2/@audio3 for voice and SFX.
The critical phrase for any character reference is **Strict identity preservation. No morphing or style changes.** Without it, the model will «improve» the face between seconds and persistent character breaks by second four. This is TRY CGI's number-one consistency tip.
Reference assignment template: «Protagonist (@image1): Strict identity preservation. Use this image for exact facial features and wardrobe. No morphing or style changes.» For audio: «Audio (@audio1): Reference for realistic electrical buzzing and low machinery hum.»
Timestamp storyboarding 0–15s
For 10–15 second videos, TRY CGI recommends three named acts with a fixed field skeleton: Action / Emotional Acting / Camera / Lighting / VFX / Audio Rule.
The canonical 15s template — 0–4s [THE ENTRY] (setup), 4–10s [THE REVELATION] (turn, often dolly-in or crash-zoom), 10–15s [ACTION RESPONSE] (resolution, handheld, motion blur). For 8s — two acts ENTRY → PAYOFF, for 10s — SETUP → CLIMAX.
The biggest trap is describing emotion in general terms («he is scared»). The right path is micro-acting: «jaw clenches, nostrils flare, pupils dilate, micro-tremor in the eyelids». Without these micro-signals the face renders as a mask with the correct overall emotion, but dead.
Common mistakes
1. Wall of text instead of TRY CGI blocks
A single-paragraph dump loses 30–40% of quality compared to the same text split across LOCATION / STYLE / STORY / CHARACTERS / SHOT STRUCTURE. The model internally routes named blocks into different generation layers, and without explicit headings the routing becomes noisy.
2. @-reference on a character without identity preservation
If you have @image1 of a face but no «Strict identity preservation. No morphing or style changes.» phrase, the model «improves» the face between seconds. By second four the character no longer matches the reference. This phrase is TRY CGI's number-one consistency tip.
3. Emotion described in general terms
«He is scared», «she is happy», «surprise» render as a dead mask. Seedance 2.0 renders micro-acting only when you describe it explicitly: «jaw clenches, pupils dilate, micro-tremor in the eyelids, nostrils flare». Aim for at least 2–3 micro-signals per act.
4. Epic music by default
A prompt with «epic orchestral soundtrack» or «dramatic background score» without the user explicitly asking for it turns a cinematic scene into a trailer. The default is «No music. Diegetic sound design only.» Add music only when the user explicitly requested it.
5. Asking for more than 15 seconds in one prompt
The platform cap is 15 seconds per run. «Make a 30-second clip» will either get truncated or break the pacing. The right path is segmentation: first prompt up to 15s, second segment via @video1 with «Extend @video1 by 15 seconds» plus new content description.
Before / after examples
Example 1
Before
video of a dog running across a square, make it dynamic
After
LOCATION: Urban square at midday. Concrete pavement, wooden benches, food stalls, crowd in background. Bright natural sunlight with strong side rim light. STYLE: Ultra-photorealistic 4K cinematic action, gritty film grain, realistic muscle and fur, practical lighting. Gritty realism like a Nolan action film. No cartoonish AI look. STORY: A dog in a black collar makes a high-speed escape across the square, demonstrating athletic jumps between people and obstacles. CHARACTERS: Dog, muscular and energetic, in a black collar. Focused eyes, pupils dilated with adrenaline, fur swept back by speed. SHOT STRUCTURE (10 SEC TOTAL): 0-4s — [SETUP] Action: Dog launches from a crouch, muscles tensed. Emotional Acting: Concentration. Pupils dilated, ears pinned back, nostrils flared. Camera: Low angle dolly tracking behind the dog. Audio Rule: Diegetic only — claws on concrete, panting. 4-10s — [CLIMAX] Action: Dog vaults a bench, weaves through a crowd. Camera: Side tracking, motion blur on background. Audio Rule: Footsteps, breath, distant crowd. No music. No music. Diegetic sound design only. Grounded realism, gritty film grain, no AI aesthetic.
Key change: TRY CGI block structure instead of one sentence. Each block routes into its own generation layer (environment, identity, temporal planner), and the final anti-AI anchor saves the shot from plastic render.
Example 2
Before
hero looks at a computer screen and realizes what happened
After
LOCATION: Dim server room, racks of blinking LEDs, smoke from a fried capacitor, single overhead bulb. REFERENCE ASSIGNMENT: - Protagonist (@image1): Strict identity preservation. Use this image for exact facial features and wardrobe. No morphing or style changes. STYLE: Ultra-photorealistic 4K cinematic, anamorphic lens flare, gritty film grain. The look of a modern tech thriller, not animation. STORY: Character realizes the breach is live, in real time, on the terminal in front of them. SHOT STRUCTURE (8 SEC TOTAL): 0-4s — [ENTRY] Action: Character types rapidly, leans toward the monitor. Emotional Acting: Focused tension. Jaw clenches, brow furrowed, breathing shallow. Camera: Medium shot, slight handheld sway. Lighting: Cold monitor glow on face, warm overhead bulb behind. 4-8s — [PAYOFF] Action: Character freezes mid-keystroke, eyes locked on the screen. Emotional Acting: Shock. Pupils dilate, nostrils flare, micro-tremor in the eyelids. Screen light reflects in the pupils. Camera: Slow dolly-in to an extreme close-up of the eyes. Lighting: Screen reflection sharp in the pupils. Audio Rule: Only the soft hum of servers and slow inhale. No music. No music. Diegetic sound design only.
Identity preservation on @image1 plus explicit micro-acting in Emotional Acting instead of «he is surprised» = a living face with a real shock beat, not an AI mask.
Example 3
Before
extend my video by 5 seconds and add a sprint
After
REFERENCE ASSIGNMENT: - Base footage (@video1): Use this as the source. Match exact lighting, color grading, lens, and frame composition of the last frame. Continue camera motion seamlessly. Extend @video1 by 5 seconds. New content begins exactly from the last frame of @video1 — preserve framing, exposure, and color grade. New action: The character in frame breaks into a sprint, accelerating from a walk over the first 2 seconds, then full sprint for the remaining 3. Camera transitions from medium tracking to a tighter handheld follow. Audio Rule: Diegetic only — footsteps accelerating, breath quickening. No music. No music. Diegetic sound design only. Grounded realism, no AI gloss.
Video Extension works only if you explicitly tag @video1 as the source and describe the seam — «continue camera motion seamlessly», «preserve framing, exposure, color grade». Without this the join breaks in the first second of the new segment.