Video

Kling: how to write prompts the model actually understands

Kuaishou · Updated:

Kling is Kuaishou's video model family, available at klingai.com. It generates up to 10-second clips (up to 15 seconds in Kling 3.0) and supports T2V, I2V, and Motion Control. Prompts accept up to ~2500 characters; the sweet spot is 50–150 words. English yields the most stable results, and a negative prompt field is supported.

What Kling does well

Kling is a text-to-video and image-to-video model aimed at cinematic scenes and product content. Standard duration is 5–10 seconds (15 seconds in Kling 3.0), resolution up to 1080p, with Elements support — up to 4 reference images for character and object consistency.

Motion Control transfers motion from a reference video onto a new character from an image — the foundation for AI influencers, virtual presenters, and dance performances. A negative prompt is supported as a separate field — a key difference from Imagen and many other models. Keyframes (exactly 2 anchor frames) are also supported.

  • T2V up to 10 seconds (15 in Kling 3.0), resolution up to 1080p
  • Image-to-Video for animating still images
  • Motion Control: transferring motion from a reference video
  • Elements — up to 4 references for consistency
  • Negative prompt as a separate field

Prompt structure

Optimal T2V structure: [Subject/Character] + [Action/Motion] + [Scene/Environment] + [Camera Movement] + [Style/Mood/Lighting]. Order matters — the model weights elements at the start of the prompt more heavily. The most important goes first.

Each block needs concrete detail: «35-year-old woman with shoulder-length auburn hair wearing an emerald green coat» instead of «a person»; «walking purposefully through fallen leaves» instead of «moving around»; «smooth tracking shot following from the side» instead of no camera at all. Limit the environment to 3–4 elements — more than ten causes overload and loss of focus. Sweet-spot length is 50–150 words.

T2V, I2V, and Motion Control modes

Each mode needs its own strategy. T2V — describe EVERYTHING: subject, action, environment, camera, style. Formula: (Subject + details) + (Action + tempo) + (Environment + lighting) + (Camera) + (Style).

I2V — describe ONLY motion, not the scene. The model already sees the image. Formula: (Subject motion) + (Environmental motion) + (Camera). Length 20–40 words. Describing what's already in the picture is an anti-pattern.

Motion Control — describe ONLY the character's appearance and setting. Motion is taken from the reference video automatically. Formula: [Character style + clothing] + [Setting/background] + [Visual quality]. Motion, gesture, and expression instructions in Motion Control are the main anti-pattern.

Common mistakes

  1. 1. Describing the scene in an I2V prompt

    In Image-to-Video the model already sees the source image. Describing appearance, clothing, or setting wastes tokens and either gets ignored or conflicts with the actual picture. An I2V prompt should be 20–40 words and describe ONLY motion and scene evolution.

  2. 2. Motion instructions in Motion Control

    Motion Control transfers motion from the reference video automatically. Phrases like «character dances», «waves hand», «walks forward» in the prompt are either ignored or conflict with motion from the video. The prompt is art direction (how it looks), not motion direction (how it moves).

  3. 3. Conflicting camera moves

    «360-degree rotation around subject while zooming in and panning left» — three simultaneous transforms almost guarantee geometry distortion. Use one primary camera move at a time: either orbit, or zoom, or pan. For complex transitions, use Multi-shot in Kling 3.0.

  4. 4. Prompts that are too short or too abstract

    A prompt under 15 words leaves too much freedom — the model fills the scene on its own. Abstract phrases like «something beautiful happens», «make it look dynamic», «cool vibes» give no visual anchors. Concrete details and physical actions give the model something to grip.

  5. 5. Negative phrasing in the main prompt

    Kling supports a negative prompt as a separate field — but not inside the main prompt. «No people, no text, not blurry» inside the main prompt is either ignored or causes the opposite effect. Move unwanted elements to the dedicated negative prompt field.

Before / after examples

Example 1

Before

car drives through a city at sunset

After

A sleek silver sports car with chrome wheels accelerates through a rain-slicked downtown street as golden sunset light breaks through storm clouds, camera tracking alongside at street level, smooth dolly motion, cinematic lighting with volumetric light rays reflecting off wet asphalt, photorealistic rendering, shot on virtual anamorphic lens, 24mm, f/2.8, warm color grading with deep contrast.

Key changes: concrete car details, the street's state, camera behavior described separately from the subject, the cinematic stack, a temporal marker for rhythm.

Example 2

Before

I2V from a photo of a woman on the beach: «woman walks to the sea»

After

Walks slowly toward the ocean, hair and clothing moving gently in the breeze, waves rolling onto shore in the background, camera slowly pushes in

I2V is short (20–40 words) and describes ONLY motion: what the subject does, what's happening in the environment, how the camera moves. Describing appearance or scene would be an anti-pattern — the model already sees the image.

Example 3

Before

Motion Control for a dance video: «character dances»

After

Style the character as a confident urban dancer wearing oversized black streetwear and white sneakers, placed in a moody underground parking lot with flickering fluorescent lights and concrete walls, cinematic realism with grainy 35mm film aesthetic, high contrast color grading, shallow depth of field with bokeh on background lights.

Motion Control describes APPEARANCE and SETTING, not motion. The dance and timing come from the reference video. Instructions like «dances energetically» here are the main anti-pattern.

Frequently asked

What clip durations does Kling offer?
Standard duration is 5–10 seconds for most versions (Kling 1.6, Kling 2.0, Kling 2.6 Pro, Kling O1). Kling 3.0 pushes the ceiling to 15 seconds and adds Multi-shot mode — up to 6 shots in one generation with narrative development. For longer videos, use Kling 3.0 or stitch several generations together in editing.
How does Image-to-Video work in Kling?
In I2V the model receives a still image and brings it to life. The key rule is to describe ONLY motion and scene evolution, not what's already in the picture. Length 20–40 words. Formula: (subject motion + tempo) + (environmental motion) + (camera behavior). Describing appearance or setting inside an I2V prompt is an anti-pattern that causes conflicts with the image.
How is Motion Control different from regular T2V?
In T2V the prompt describes EVERYTHING — subject, action, motion, camera, environment, style. In Motion Control, motion, gestures, expressions, and timing come from the reference video automatically; the prompt describes ONLY the character's appearance and setting. A fundamentally different strategy: prompt as art direction, not motion direction. Sweet-spot length 30–80 words.
Why use Elements, and how many references should I include?
Elements is a mode with reference images for character and object consistency in video. The sweet spot is 2–4 high-quality references from different angles. Beyond 4, the model gets confused about priorities and starts mixing features. Use cases: recurring characters across a series, branded content, narratives with a consistent protagonist.
Can I write prompts in languages other than English?
You can, but quality drops. Kling was trained on multilingual data, but English produces the most stable results — especially for cinematic vocabulary and camera-move descriptions. For production work, translate the prompt to English; for experiments and quick tests other languages are acceptable but suboptimal.
How do I use the negative prompt?
Kling's negative prompt is a separate field, not part of the main prompt. Move unwanted elements there: «No people, no text overlays, no distortion in vehicle proportions», «No watermark, no logos, no extra limbs». It's insurance against common artifacts: extra people in product shots, watermarks, geometry issues. Don't duplicate negative phrasing in the main prompt — it doesn't work there.
Does Opten support Kling?
Yes, the Opten extension auto-detects Kling and its modes (T2V, I2V, Motion Control, Elements) inside klingai.com. Each mode uses its own scoring strategy: T2V — the full 5-component checklist; I2V — focus on motion and a short prompt; Motion Control — checking for the absence of motion instructions. One click delivers a rewrite in the correct structure.

Related models

Ready to write Kling (general) prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672