Video

Seedance 2.0: how to write prompts the model actually understands

ByteDance · Updated:

Seedance 2.0 is ByteDance's flagship video model on the 即梦 (Jimeng) platform. It generates 4–15 second clips up to 2K, accepts up to 9 images, 3 videos, and 3 audio inputs per request. It understands @-references, second-by-second storyboarding, and named TRY CGI blocks. On syntx.ai the standard is English; on the native platform Chinese performs better.

What is new in Seedance 2.0

Compared to 1.0 Pro and 1.5 Pro this is a generational jump. Duration is no longer fixed at 5 or 10 seconds — anywhere from 4 to 15 is open. Full multimodality landed: up to 12 files per request across images, videos, and audio. Consistency Control via @-references, voice cloning and sound design, video extension through @Video, and second-precise storyboarding all became available.

The key architectural shift: the model internally routes named blocks (LOCATION, STYLE, STORY, CHARACTERS, SHOT STRUCTURE) into different subsystems — environment, identity, temporal planner. A wall of text in a single paragraph yields noticeably worse output than the same text split across explicit blocks.

  • Duration 4–15 seconds (vs 5/10 in 1.0/1.5)
  • Up to 9 images + 3 videos + 3 audio per request
  • Full Consistency Control via @image, @video, @audio
  • Second-by-second storyboarding (0–4s / 4–10s / 10–15s)
  • Voice cloning and sound design

TRY CGI prompt structure

Canonical block order for cinematic results: [TITLE & ACT] → LOCATION → REFERENCE ASSIGNMENT → STYLE → STORY → CHARACTERS → SHOT STRUCTURE. One blank line between blocks, a space after every colon.

LOCATION — environment, lighting, weather, key background details. STYLE — visual preset ("Ultra-photorealistic 4K live-action cinema", "Gritty film grain"). STORY — what happens in this specific generation, in one or two sentences. CHARACTERS — participants, their current mood, current appearance. SHOT STRUCTURE — the act-based breakdown.

This works stronger than the basic 6-step formula because named blocks route into the right generation layers.

@-references and identity preservation

Seedance 2.0 accepts references via an `@`-prefix bound to a role: @image1/@image2/@image3 for characters and scenes, @video1/@video2/@video3 for camera and rhythm cloning, @audio1/@audio2/@audio3 for voice and SFX.

The critical phrase for any character reference is **Strict identity preservation. No morphing or style changes.** Without it, the model will «improve» the face between seconds and persistent character breaks by second four. This is TRY CGI's number-one consistency tip.

Reference assignment template: «Protagonist (@image1): Strict identity preservation. Use this image for exact facial features and wardrobe. No morphing or style changes.» For audio: «Audio (@audio1): Reference for realistic electrical buzzing and low machinery hum.»

Timestamp storyboarding 0–15s

For 10–15 second videos, TRY CGI recommends three named acts with a fixed field skeleton: Action / Emotional Acting / Camera / Lighting / VFX / Audio Rule.

The canonical 15s template — 0–4s [THE ENTRY] (setup), 4–10s [THE REVELATION] (turn, often dolly-in or crash-zoom), 10–15s [ACTION RESPONSE] (resolution, handheld, motion blur). For 8s — two acts ENTRY → PAYOFF, for 10s — SETUP → CLIMAX.

The biggest trap is describing emotion in general terms («he is scared»). The right path is micro-acting: «jaw clenches, nostrils flare, pupils dilate, micro-tremor in the eyelids». Without these micro-signals the face renders as a mask with the correct overall emotion, but dead.

Common mistakes

  1. 1. Wall of text instead of TRY CGI blocks

    A single-paragraph dump loses 30–40% of quality compared to the same text split across LOCATION / STYLE / STORY / CHARACTERS / SHOT STRUCTURE. The model internally routes named blocks into different generation layers, and without explicit headings the routing becomes noisy.

  2. 2. @-reference on a character without identity preservation

    If you have @image1 of a face but no «Strict identity preservation. No morphing or style changes.» phrase, the model «improves» the face between seconds. By second four the character no longer matches the reference. This phrase is TRY CGI's number-one consistency tip.

  3. 3. Emotion described in general terms

    «He is scared», «she is happy», «surprise» render as a dead mask. Seedance 2.0 renders micro-acting only when you describe it explicitly: «jaw clenches, pupils dilate, micro-tremor in the eyelids, nostrils flare». Aim for at least 2–3 micro-signals per act.

  4. 4. Epic music by default

    A prompt with «epic orchestral soundtrack» or «dramatic background score» without the user explicitly asking for it turns a cinematic scene into a trailer. The default is «No music. Diegetic sound design only.» Add music only when the user explicitly requested it.

  5. 5. Asking for more than 15 seconds in one prompt

    The platform cap is 15 seconds per run. «Make a 30-second clip» will either get truncated or break the pacing. The right path is segmentation: first prompt up to 15s, second segment via @video1 with «Extend @video1 by 15 seconds» plus new content description.

Before / after examples

Example 1

Before

video of a dog running across a square, make it dynamic

After

LOCATION: Urban square at midday. Concrete pavement, wooden benches, food stalls, crowd in background. Bright natural sunlight with strong side rim light.

STYLE: Ultra-photorealistic 4K cinematic action, gritty film grain, realistic muscle and fur, practical lighting. Gritty realism like a Nolan action film. No cartoonish AI look.

STORY: A dog in a black collar makes a high-speed escape across the square, demonstrating athletic jumps between people and obstacles.

CHARACTERS: Dog, muscular and energetic, in a black collar. Focused eyes, pupils dilated with adrenaline, fur swept back by speed.

SHOT STRUCTURE (10 SEC TOTAL):

0-4s — [SETUP]
  Action: Dog launches from a crouch, muscles tensed.
  Emotional Acting: Concentration. Pupils dilated, ears pinned back, nostrils flared.
  Camera: Low angle dolly tracking behind the dog.
  Audio Rule: Diegetic only — claws on concrete, panting.

4-10s — [CLIMAX]
  Action: Dog vaults a bench, weaves through a crowd.
  Camera: Side tracking, motion blur on background.
  Audio Rule: Footsteps, breath, distant crowd. No music.

No music. Diegetic sound design only. Grounded realism, gritty film grain, no AI aesthetic.

Key change: TRY CGI block structure instead of one sentence. Each block routes into its own generation layer (environment, identity, temporal planner), and the final anti-AI anchor saves the shot from plastic render.

Example 2

Before

hero looks at a computer screen and realizes what happened

After

LOCATION: Dim server room, racks of blinking LEDs, smoke from a fried capacitor, single overhead bulb.

REFERENCE ASSIGNMENT:
- Protagonist (@image1): Strict identity preservation. Use this image for exact facial features and wardrobe. No morphing or style changes.

STYLE: Ultra-photorealistic 4K cinematic, anamorphic lens flare, gritty film grain. The look of a modern tech thriller, not animation.

STORY: Character realizes the breach is live, in real time, on the terminal in front of them.

SHOT STRUCTURE (8 SEC TOTAL):

0-4s — [ENTRY]
  Action: Character types rapidly, leans toward the monitor.
  Emotional Acting: Focused tension. Jaw clenches, brow furrowed, breathing shallow.
  Camera: Medium shot, slight handheld sway.
  Lighting: Cold monitor glow on face, warm overhead bulb behind.

4-8s — [PAYOFF]
  Action: Character freezes mid-keystroke, eyes locked on the screen.
  Emotional Acting: Shock. Pupils dilate, nostrils flare, micro-tremor in the eyelids. Screen light reflects in the pupils.
  Camera: Slow dolly-in to an extreme close-up of the eyes.
  Lighting: Screen reflection sharp in the pupils.
  Audio Rule: Only the soft hum of servers and slow inhale. No music.

No music. Diegetic sound design only.

Identity preservation on @image1 plus explicit micro-acting in Emotional Acting instead of «he is surprised» = a living face with a real shock beat, not an AI mask.

Example 3

Before

extend my video by 5 seconds and add a sprint

After

REFERENCE ASSIGNMENT:
- Base footage (@video1): Use this as the source. Match exact lighting, color grading, lens, and frame composition of the last frame. Continue camera motion seamlessly.

Extend @video1 by 5 seconds. New content begins exactly from the last frame of @video1 — preserve framing, exposure, and color grade.

New action: The character in frame breaks into a sprint, accelerating from a walk over the first 2 seconds, then full sprint for the remaining 3. Camera transitions from medium tracking to a tighter handheld follow.

Audio Rule: Diegetic only — footsteps accelerating, breath quickening. No music.

No music. Diegetic sound design only. Grounded realism, no AI gloss.

Video Extension works only if you explicitly tag @video1 as the source and describe the seam — «continue camera motion seamlessly», «preserve framing, exposure, color grade». Without this the join breaks in the first second of the new segment.

Frequently asked

How is Seedance 2.0 different from 1.5 Pro?
Five key upgrades: free 4–15 second duration instead of fixed 5/10, full multimodality with up to 12 files per request, Consistency Control via @-references, voice cloning and sound design, and second-by-second storyboarding. For serious cinematic content this is a clear upgrade; for simple 5-second shots 1.5 Pro is faster.
Which language should I write the prompt in?
On the native 即梦 (Jimeng) platform Chinese performs best — the model was trained on Chinese data. On syntx.ai the standard is English and quality is stable. Russian also works but produces slightly less predictable results in complex scenes. Technical anchors (4K, dolly-in, handheld, film grain) always stay in English inside any other language.
What about realistic human faces in uploaded references?
The platform blocks realistic human faces in @image and @video references — this is a ByteDance policy and cannot be bypassed. Options: use stylized portraits (concept art, illustration), shoot scenes without recognizable faces (silhouettes, back-shots, long shots), or generate the character fully from text without any reference.
Can I make a video longer than 15 seconds?
Not in a single run — 15 seconds is a hard cap. Long videos are assembled in segments: the first prompt up to 15s ends on a «clean» frame, the second segment uses the first as @video1 with an instruction «Extend @video1 by Xs» plus new content. Between segments you must describe the seam — what is on the last frame of segment one and how it continues in segment two.
Why is «No music. Diegetic sound design only.» needed?
Without explicit guidance Seedance 2.0 often adds background music — that turns a cinematic scene into a trailer. Serious film sounds like real life: footsteps, breath, ambient noise, no score during hard beats. Diegetic means «sounds whose source is in the frame». This final anchor acts as a filter that switches off the default «trailer mode».
Which micro-acting descriptors should go in each act?
TRY CGI recommends at least 2–3 micro-signals across three groups. The «gaze» group — eyes darting, pupils dilating, focused gaze, micro-tremor in the eyelids. The «facial muscles» group — jaw clenching, nostrils flaring, brow tension, lips tightening. The «breathing and body» group — visible heavy breathing, shoulder movement on inhale, hand tremor, visible sweat. Without these descriptors the model renders a mask with the right overall emotion but no life.
Does Opten support Seedance 2.0?
Yes, the Opten extension detects Seedance 2.0 inside syntx.ai and scores prompts against the TRY CGI structure: checks for named LOCATION/STYLE/STORY/CHARACTERS blocks, identity preservation on @-references, micro-acting inside Emotional Acting, an Audio Rule with a diegetic anchor, and anti-AI phrases in the closing line. One click gives you a rewrite in the canonical format.

Related models

Ready to write Seedance 2.0 prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672