Video

Seedance 2.0: how to write prompts the model actually understands

ByteDance · Updated:

Seedance 2.0 is a video model from ByteDance (Jimeng platform) with 4–15 seconds per generation and resolution up to 2K. Powerful multimodal input: up to 9 images, 3 videos, and 3 audio files per request. 10 generation types, timestamp storyboarding for long videos, native voice control. Prompts up to 2,000 characters.

What Seedance 2.0 does well

Seedance 2.0 is one of the most feature-rich public video models. Ten generation types in one product: T2V, Consistency Control with @-references, copying camera from a reference video, copying VFX, story completion, video extension, voice cloning, one-take long shot, video editing, beat sync to music.

Multimodal input: up to 9 images (jpeg/png/webp/bmp/tiff/gif, <30MB), up to 3 videos (mp4/mov, 2–15s, <50MB, 480p–720p), up to 3 audio files (mp3/wav, ≤15s combined, <15MB), max 12 files per request. Duration 4–15 seconds per pass; for longer content, sequential extension via @Video.

  • 10 generation types including voice cloning and beat sync
  • Multimodal input: 9 images + 3 videos + 3 audio files
  • Duration 4–15 seconds, resolution up to 2K
  • @-references for character and scene consistency control
  • Timestamp storyboarding for 13–15 second narratives

Basic prompt structure

Optimal formula: [Subject/Character] + [Scene/Environment] + [Action/Motion] + [Camera Movement] + [Timing Breakdown] + [Audio/Sound] + [Style/Mood]. You don't have to use every element — composition depends on video type.

The more specific, the better. Active verbs over abstractions («walks, turns, picks up» beats «something happens»). At least one shot-size or camera-movement directive per prompt. Concrete physical description of the scene and environment.

Prompt length is up to 2,000 characters. On syntx.ai (English-language platform) English is recommended; on native Jimeng Chinese yields slightly better results. English is fine either way — the model is trained bilingually.

The 10 generation types

T2V — text-only generation. Consistency Control — lock a character, product, or scene via @-references. Copy Camera — upload a reference video to copy camera moves and choreography. Copy VFX — replicate transitions and effects from a reference video.

Story Completion — the model continues a narrative from a storyboard or image sequence. Video Extension — smooth continuation of an existing video. Voice Control — voice cloning, dialogue generation, sound design. One-Take Long Shot — continuous shot without cuts.

Video Editing — character swaps, plot changes. Beat Sync — visual rhythm synced to music via reference audio. Each type has its own prompt formula (see platform documentation).

Timestamp storyboarding

The most powerful technique for 13–15 second videos is per-second breakdown. It gives precise control over narrative pacing:

0-3s: [scene + camera + sound] 4-8s: [scene + camera + sound] 9-12s: [scene + camera + sound] 13-15s: [scene + camera + sound]

Key rule — realistic timecodes. A full action needs 2–3 seconds, a short gesture 1 second. Don't try to cram «walking across a room» into 0.5 seconds. For 4–8 second videos, timestamp isn't required — one or two key moments are enough. For 9–12 seconds, timing is recommended. For 13–15 seconds, it's mandatory for a good result.

Common mistakes

  1. 1. Prompt too short or too long

    Under 15 words — the model invents too much, results are unpredictable. Over 2,000 characters — detail overload, the model starts ignoring parts of the prompt. The sweet spot for most scenes is 50–200 words; for timestamp storyboards 300–500 words with explicit scenes.

  2. 2. Conflicting camera moves at once

    «Zoom in while panning left and orbiting around» — the model can't fit three simultaneous moves into 5–10 seconds of screen time. Pick one main move per scene plus an optional speed modifier. If you need different moves, split them across timestamp segments.

  3. 3. Asking for more than 15 seconds in one pass

    15 seconds is a hard platform limit per generation. A «30-second video» request either truncates to 15 or errors out. For longer content, use the multi-segment approach via Video Extension: segment by segment with smooth handoffs.

  4. 4. Abstract phrasing instead of physical actions

    «Something beautiful happens», «emotional moment», «mood shifts» — the model doesn't understand abstractions. Describe concrete physical actions: «she slowly turns her head», «light fades from warm to cool», «petals fall onto the table». This delivers predictable, controllable results.

  5. 5. Realistic human faces in uploaded references

    The Jimeng platform blocks uploading realistic human faces as references — it's a ByteDance policy, not a bypassable limit. For I2V with humans, use stylized references (illustration, painting, cartoon) or generate the human-containing scene via T2V without a reference image.

Before / after examples

Example 1

Before

video where a man runs down the street

After

Wide tracking shot of a man in a black hoodie sprinting down a narrow alley at dusk. Side tracking, camera moves at chest height parallel to him. He knocks over a fruit stall, stumbles, gets back up, keeps running. Wet pavement reflects neon signs. Loud panicked footsteps, distant crowd murmur, heavy breathing. 16:9, 24fps, 8 seconds, cinematic noir tone.

Concrete scene (alley at dusk), physical detail (knocks over stall, wet pavement), camera parameters (chest height parallel), sound (footsteps, crowd, breathing), technical parameters at the end. A working T2V prompt.

Example 2

Before

long 15-second video with a hero's story

After

0-3s: Wide shot, a woman in a red coat walks toward a wooden cabin in a snowy forest. Slow forward dolly, soft ambient wind, crunching snow.
4-8s: Medium shot, she opens the cabin door, warm orange light spills onto the snow. Camera slowly pushes in.
9-12s: Interior close-up, she sets a lantern on a wooden table, takes off her gloves. Soft crackling fireplace ambient.
13-15s: Wide interior shot, she sits by the fire, exhales. Camera pulls back to reveal the warm, intimate room. Soft piano music begins.

Timestamp storyboarding with realistic pacing (3–4 seconds per beat), consistent character (the woman in red coat), smooth narrative across 4 scenes. Sound varies by scene. The Seedance 2.0 sweet spot.

Example 3

Before

product spot with my brand using 3 shots

After

Use @Image1 (product hero shot) and @Image2 (lifestyle context). Beat sync to @Audio1 (brand music track).
0-2s: Close-up of @Image1 product rotating slowly on a marble surface. Soft side light, shallow DoF.
3-5s: Cut to @Image2 lifestyle scene, person holds the product naturally, smiles slightly. Hand-held camera, warm afternoon light.
6-8s: Wide editorial shot, product centered with brand color palette around it. Smooth dolly out. Beat hit at 8s. 16:9, 24fps.

Multimodal prompt with @-references (Image1, Image2, Audio1), beat sync to music, timestamp breakdown for 3 shots, technical parameters. This is the production scenario Seedance 2.0 is designed for.

Frequently asked

What language should I write the prompt in?
Seedance 2.0 is a Chinese model — on native Jimeng, Chinese yields slightly better results. But English is also well supported, especially on syntx.ai (the English-language platform). For most production scenarios English is the standard, not a penalty. If you know Chinese, write in it — that delivers a marginal lift.
How long can the video be?
From 4 to 15 seconds per generation, flexible with 1-second steps. For longer content, use multi-segment generation via Video Extension: upload the previous video as @Video1, write «Extend @Video1 by Xs» + a description of the new part. That's how 30+ second narratives are built from sequential segments.
When is timestamp storyboarding mandatory?
For 13–15 second videos — mandatory for a good result; otherwise the model can't handle the long narrative. For 9–12 seconds — recommended. For 4–8 seconds — optional, one or two key moments are enough. Format: «0-3s: …», «4-8s: …», with realistic timecodes (2–3 seconds per full action).
How do I keep a character consistent across multiple scenes?
Use Consistency Control with @-references. Upload 1–3 photos of the character, reference @Image1 in the prompt: «@Image1 walks across the room», «@Image1 sits down». The model holds the appearance throughout the generation. For a series of videos, the same @-reference yields a consistent character across multiple clips.
Can I clone a voice?
Yes, via Voice Control. Upload reference audio (mp3/wav, ≤15s, <15MB) and reference @Audio1 in the prompt as the voice source for dialogue. Dialogue in quotes with character markers: «The woman calmly says: "I told you."». This delivers lip-sync with the cloned voice — a powerful tool for dubbing and virtual characters.
What are the input file limits?
Up to 9 images (jpeg/png/webp/bmp/tiff/gif, <30MB each), up to 3 videos (mp4/mov, 2–15s, <50MB, 480p–720p), up to 3 audio files (mp3/wav, ≤15s combined, <15MB). Max 12 files per request. The platform blocks realistic human faces in uploaded references — for I2V with humans, use stylized references.
Does Opten support Seedance 2.0?
Yes, the Opten extension recognizes Seedance inside syntx.ai and scores prompts against the model-specific structure: it checks for subject, action, and camera, correct timestamp storyboarding for long videos, realistic timecodes, @-reference use for consistency, and audio description in the prompt. One click yields a rewrite in the right structure.

Related models

Ready to write Seedance (general) prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672