Kling Motion Control: how to write prompts the model actually understands
Kuaishou · Updated:
Kling Motion Control is a Kuaishou Kling mode for transferring motion from a reference video onto a character from an image. Duration 5–10 seconds, resolution up to 4K (Kling 3.0), Motion Brush up to 6 elements per frame. The golden rule: the prompt is art direction (how it looks), NOT motion direction (how it moves).
What Kling Motion Control is
Motion Control is NOT Text-to-Video. Here the prompt plays a fundamentally different role: motion, gestures, expression, and timing come from a reference video or are painted with a brush; the prompt describes only the character's appearance and the setting.
Two primary modes. Reference Video Motion Transfer — the user uploads a reference video (3–30 seconds, clear subject, minimal camera shake) and an image of the character. The system extracts skeletal animation, timing, and contact dynamics and applies them to the character. Motion Brush — the user paints motion trajectories directly on the image, up to 6 separate elements with individual trajectories; the prompt describes the overall scene, not the motion.
- Motion transfer from a video reference onto a character from an image
- Motion Brush: up to 6 elements with individual trajectories
- 6-axis camera in Kling 3.0: pan, tilt, roll, dolly, truck, pedestal
- Duration 5–10 seconds, up to 4K (Kling 3.0)
- Element Binding: locks facial features and character identity
The three pillars of the prompt
A Motion Control prompt is built from three blocks. Character appearance: clothing and style (formal, casual, cinematic), age range, mood, facial details (skin texture, expressive eyes, lighting), overall visual tone (realistic, stylized, polished).
Setting: type of environment (studio, office, city, nature), depth and lighting (soft blur, shallow depth of field), atmosphere (professional, cozy, dramatic, minimalist).
Visual style and quality: camera style (cinematic, documentary, social media), color grading (neutral, warm, contrasty), level of realism (photorealistic, commercial, editorial). Formula: [Character appearance] + [Setting/background] + [Visual style]. Sweet-spot length 30–80 words.
The golden rule: art direction, not motion direction
Reference Video Transfer takes motion, gestures, expression, and timing AUTOMATICALLY from the reference video. Describing motion in the prompt is the mode's main anti-pattern. Phrases like «character dances», «waves hand», «walks forward», «smiles then frowns», «at 3 seconds raises hand» are either ignored or conflict with motion from the video.
Motion Brush works similarly: the user PAINTS trajectories on the image; motion is set by the brush. The prompt describes the overall scene and atmosphere, not specific element motions. General dynamic phrasing («gentle breeze», «flowing water») is acceptable, but specific animation instructions aren't.
The golden rule is one: the prompt is how it looks, NOT how it moves.
Common mistakes
1. Describing motion in the prompt
The mode's main anti-pattern. «The character dances», «waves hand», «walks forward», «turns head left» — motion in Reference Video Transfer comes from the reference video automatically. In Motion Brush it's painted with a brush. The prompt describes appearance and setting, NOT body motion.
2. Describing expression and emotion as actions
«Character smiles, then frowns», «expression changes from happy to sad» — expression and emotional transitions also come from the reference video. The prompt can state an overall emotional tone («friendly expression», «serious demeanor») as part of the character description, but not a sequence of expression changes.
3. Describing performance timing
«At 3 seconds character raises hand», «after 5 seconds turns to camera» — timing comes entirely from the reference video. Any temporal markers in a Motion Control prompt conflict with the actual performance and can cause artifacts. Timing = reference, not prompt.
4. A prompt written like regular T2V
Writing a prompt with actions, camera moves, and style (like for Text-to-Video) is the wrong mode. A T2V prompt here yields worse results than concise art direction. Sweet-spot length for Motion Control is 30–80 words; longer prompts usually contain extra motion instructions.
5. Prompts that are too short or too abstract
A prompt under 10 words leaves too little visual information about the character and setting. Abstract phrasing — «make it look cool», «something dynamic», «professional vibe» — gives no anchor. Concrete appearance details (clothing, materials) and setting details (location type, lighting) are required.
Before / after examples
Example 1
Before
corporate presenter
After
Style the character as a polished corporate presenter in a tailored navy suit with a crisp white shirt and minimal silver watch, realistic skin texture with subtle facial details, professional grooming, neat short haircut. Placed in a modern office environment with floor-to-ceiling glass walls overlooking a city skyline, soft natural daylight from above, clean minimalist interior. Cinematic realism with shallow depth of field, professional commercial quality, neutral warm color grading.
All three pillars: character appearance with concrete details, setting with depth and lighting, visual quality. No motion instructions — those come from the reference video.
Example 2
Before
girl in UGC style
After
Turn the character into a casual lifestyle creator wearing a soft cream knit sweater and neutral makeup with natural facial details, mid-20s age range, friendly expression. Set against a warm home interior with bookshelf and indoor plants in soft bokeh background, golden afternoon window light, cozy atmosphere. Clean commercial lighting with slightly elevated saturation, social-media UGC aesthetic, smartphone camera feel.
Full character context shift without touching the performance. Style (cream knit, neutral makeup), setting (home interior, bookshelf, window light), quality (UGC aesthetic) — pure art direction.
Example 3
Before
Motion Brush: «golden wheat field at sunset»
After
A golden wheat field under warm sunset light, gentle atmosphere with soft golden hour glow, photorealistic rendering with shallow depth of field, layered composition with distant tree line and pastel orange sky in the background, cinematic color grading with rich amber and ochre tones.
For Motion Brush the prompt describes the scene and atmosphere, not motion. The swaying wheat is painted by the user with the brush directly on the image. General phrases like «gentle atmosphere» are fine; «wheat sways from left to right» is not.