Kling 2.6 Pro: how to write prompts the model actually understands
Kuaishou · Updated:
Kling 2.6 Pro is Kuaishou's video model on klingai.com. It generates clips up to 10 seconds at 1080p and supports T2V, I2V, Elements (up to 4 references), and Motion Control. Optimal prompt length is 50–150 words; it performs best in English and accepts a negative prompt as a separate field.
What Kling 2.6 Pro does well
Kling 2.6 Pro is a production tool for short video: product shots, landscape time-lapses, corporate presenters, UGC-style content. Duration up to 10 seconds, resolution up to 1080p, four modes — Text-to-Video for from-scratch generation, Image-to-Video for animating still frames, Elements for character consistency through 2–4 references, and Motion Control for transferring motion from a video reference.
The negative prompt is a separate field — artifacts and unwanted elements go there. This gives cleaner control than models without a negative prompt such as Imagen.
- Duration up to 10 seconds, resolution up to 1080p
- Four modes: T2V, I2V, Elements, Motion Control
- Elements — 2–4 references for character and object consistency
- Negative prompt as a separate field
- Emphasis via ++keyword++ to amplify elements
The four-component prompt structure
Optimal structure for Kling 2.6 Pro: [Scene Setting] + [Subject Description] + [Motion Directives] + [Stylistic Guidance].
Scene Setting — environment and lighting. «A sunlit coastal highway with dramatic cliffs on one side and sparkling ocean on the other, golden hour lighting with long shadows».
Subject Description — detailed description of main objects. «A sleek red convertible sports car with chrome wheels and leather interior».
Motion Directives — clear articulation of motion. «Camera tracks alongside the car as it drives at moderate speed, then gradually pulls back to reveal the expansive coastline».
Stylistic Guidance — visual aesthetic. «Cinematic 4K quality, shallow depth of field, vibrant color grading». The key rule — the model weights the start of the prompt more heavily, so important things go first.
I2V and Motion Control: different strategies
I2V (Image-to-Video) describes ONLY motion, not the scene. The model already sees the image. Length 20–40 words, focus on how the scene comes alive: «Camera slowly tracks right while maintaining focus on the central figure, subtle wind animation affecting the subject's hair and clothing, leaves in background sway gently, warm lighting gradually intensifies».
Motion Control transfers motion from a reference video onto a character from an image. The prompt describes APPEARANCE and SETTING, not motion. Formula: [Character style/appearance] + [Setting/background] + [Visual quality]. Example: «Make the character appear as a polished corporate presenter in a tailored navy suit, realistic skin texture, professional grooming. Place in a modern office environment with glass walls, soft daylight, and shallow depth of field».
Common mistakes
1. Describing the scene in an I2V prompt
In Image-to-Video the model already sees the source image. Describing appearance, clothing, or setting wastes tokens and conflicts with the actual picture. An I2V prompt should be 20–40 words and describe ONLY motion and scene evolution — what moves, how, and at what tempo.
2. Motion instructions in Motion Control
Motion Control transfers motion from the reference video automatically. «Character dances», «waves hand», «walks energetically» in the prompt is the mode's main anti-pattern. The prompt describes art direction (how it looks, where it is, what quality), not motion direction.
3. Conflicting camera moves and styles
«360° rotation + zoom in» — multiple simultaneous transforms cause geometry distortion. «Golden hour» + «studio lighting» in one prompt confuses the model's style interpretation. Use one primary camera move and keep a consistent lighting scheme throughout the prompt.
4. Overloading the environment with details
More than 10 environmental elements in one prompt causes loss of focus — the model tries to fit everything into 10 seconds and simplifies or confuses elements. Aim for 3–4 key elements with lighting and atmosphere description. Push the rest into the negative prompt or Elements.
5. Important information at the end of the prompt
Kling 2.6 Pro weights the start of the prompt more heavily. If the main subject or key action is buried in the last sentence, the model prioritizes whatever it read first. Put the main subject and scene in the first 30–50 words; add style and technical details after.
Before / after examples
Example 1
Before
red car drives on a road
After
A sunlit coastal highway with dramatic cliffs on one side and sparkling ocean on the other, golden hour lighting with long shadows. A ++sleek red convertible sports car++ with chrome wheels and tan leather interior. Camera tracks alongside the car as it drives at moderate speed along the winding road, then gradually pulls back to reveal the expansive coastline. Cinematic 4K quality, shallow depth of field, vibrant color grading, shot on virtual anamorphic lens, 24mm, f/2.8.
The full four-component structure: scene setting, subject description with ++ emphasis, motion directives with tempo, stylistic guidance with technical markers.
Example 2
Before
I2V from a café photo: «person drinks coffee»
After
Camera slowly tracks right while the woman raises the cup to her lips and takes a slow sip, steam rising gently from the espresso, leaves in background sway in light breeze, warm afternoon light gradually intensifies
I2V is short (20–40 words), describing only motion and scene evolution. No appearance or clothing description — the model already sees the image.
Example 3
Before
Motion Control: «presenter pitches a product»
After
Make the character appear as a polished corporate presenter in a tailored navy suit with a crisp white shirt, realistic skin texture, professional grooming, neat short haircut. Place in a modern office environment with floor-to-ceiling glass walls overlooking a city skyline, soft daylight from above, clean minimalist interior. Cinematic realism with shallow depth of field, professional commercial quality.
Motion Control describes appearance and setting only. Gestures, expressions, and presentation poses come from the reference video. Instructions like «gestures with hands» are an anti-pattern here.