How is Kling 3.0 different from Kling 2.6 Pro?

Three key upgrades: duration grew from 10 to 15 seconds, Multi-shot appeared (up to 6 shots in one generation), and native dialogue and audio generation with voice-tone control was added. For short product clips without speech, 2.6 Pro remains optimal. For dialogue narratives, multi-shot compositions, and branded content with audio, choose 3.0.

How does Multi-shot work?

Multi-shot lets you create up to 6 shots in one generation with narrative continuity. Structure: the Master Prompt sets the overall scene context, then numbered Multi shot Prompt 1, 2, ... blocks describe framing, action, and duration for each. The model automatically preserves characters and setting across shots and varies angles. Without clear shot markers the mode misbehaves.

How do I generate dialogue in Kling 3.0?

Four required principles: unique character labels `[Character A: description]`, visual action BEFORE the line, tonal voice descriptors in brackets before the text «[Agent, raspy deep voice]:», and linking words between lines («Immediately», «After a pause»). Skipping any of the four causes merged lines or lost emotional color.

Are languages other than English supported in dialogue?

Yes, Kling 3.0 supports multilingual dialogue including Russian, accents, and code-switching within a single scene — a character can switch between two languages. For the prompt itself (instructions to the model) English is best, but character lines can be in any supported language in quotes.

How many shots are optimal for Multi-shot?

The technical ceiling is 6 shots. In practice 2–4 shots give the best results: the model holds narrative continuity more easily, with characters and setting consistent. 5–6 shots work for more complex storyboards but require a detailed Master Prompt and consistent character descriptions across all shots. Each shot gets its own duration in (Duration: Xs) format.

How is I2V in Kling 3.0 better than previous versions?

The main improvement is identity and legible-text preservation from the source image. Logos, product labels, captions, and key typography don't «drift» in animation. Critical for branded advertising. The prompt still describes ONLY motion and scene evolution (20–40 words), without repeating what's already visible in the image.

Does Opten support Kling 3.0?

Yes, the Opten extension auto-detects Kling 3.0 and all its specific modes (Multi-shot, dialogue scenes, I2V) inside klingai.com. For dialogue it checks for character labels, visual anchors, tonal descriptors, and linking words. For Multi-shot — shot markers and durations. One click delivers a rewrite in the correct structure.

Video

Kling 3.0: how to write prompts the model actually understands

Name: Kling 3.0
Brand: Kuaishou

Kuaishou · Updated: May 19, 2026

Kling 3.0 is Kuaishou's flagship video model on klingai.com. Duration up to 15 seconds, Multi-shot with up to 6 shots in one generation, native dialogue and audio generation with voice-tone control, multilingual accents, and code-switching. Built to understand directorial intent, not just an object list.

What's new in Kling 3.0

Kling 3.0 is a major upgrade to Kuaishou's video model. Single-generation length grew from 10 to 15 seconds, enough for real narrative progression. The headline feature is Multi-shot: up to 6 shots in one generation with automatic angle variation and preserved narrative continuity.

Native audio generation appears for the first time in the family: dialogue with a unique voice tone per character, ambient sound, music, SFX. Multilingual dialogue is supported, with accents and code-switching inside a single scene. The model excellently preserves identity, layout, and even text from the source image in I2V — critical for branded content.

Duration up to 15 seconds (vs 10 in Kling 2.6 Pro)
Multi-shot: up to 6 shots with narrative continuity
Native audio: dialogue, ambient, music, SFX
Multilingual dialogue with accents and code-switching
Text and logos from the source image preserved in I2V

Prompt structure as a director's script

Kling 3.0 is built to understand cinematic intent, not just visual descriptions. Prompts should be written as directorial instructions, not as an object list.

Optimal structure: [Scene Setup + Atmosphere] + [Character Introduction] + [Action/Dialogue Sequence] + [Camera Direction] + [Audio/Sound Design]. Anchor characters at the start of the prompt and keep their descriptions consistent across all shots — the model locks in character, object, and environment traits.

Describe camera motion explicitly: «tracking», «following», «freezing», «panning», «moving in sync». Long takes work better when the camera's relationship to the subject is clearly described. Sweet-spot length 50–200 words (longer for multi-shot).

Multi-shot: up to 6 shots in one generation

The key feature of Kling 3.0. Multi-shot enables a multi-shot storyboard in one generation with narrative continuity.

Formula: Master Prompt: [overall scene description] Multi shot Prompt 1: [shot 1 description] (Duration: Xs) Multi shot Prompt 2: [shot 2 description] (Duration: Xs)

Each shot must have its own framing, action, and duration. The Master Prompt sets the overall context. The model automatically varies angles and composition while preserving narrative continuity between shots. Supported types: profile shots, macro close-ups, tracking shots, POV, shot-reverse-shot for dialogue. Multi-shot without clear shot markers is the mode's main anti-pattern.

Dialogue with native audio

Kling 3.0 supports character-anchored dialogue generation. Four required principles:

Structural naming: unique character labels throughout the prompt — `[Character A: Black-suited Agent]` and `[Character B: Female Assistant]`. Pronouns instead of labels («he says…») are an anti-pattern.

Visual anchoring: physical action BEFORE the line. «The agent slams his hand on the table. [Agent, angrily]: "Where is the truth?"»

Audio details: a unique tone and emotion per character. «[Agent, raspy, deep voice]: "Don't move." [Assistant, clear, fearful voice]: "I'm scared."»

Temporal control: linking words between lines. «[Agent]: "Why?" Immediately, [Assistant]: "Because it's time."» Without a connector, the model can merge lines.

Common mistakes

1. Pronouns instead of character labels in dialogue
«He says…», «Then she replies…» — the model doesn't know who's speaking and merges lines or changes voice between them. Use unique structural labels `[Character A: description]` and `[Character B: description]` throughout the prompt. Every line gets an explicit character label.
2. Multi-shot without marking individual shots
Describing several scenes in a row without `Multi shot Prompt 1:`, `Multi shot Prompt 2:` markers makes the model treat it as one long shot and stumble on transitions. Each shot needs to be a separate block with its own framing, action, and duration.
3. Dialogue without visual anchoring
If the line comes first and the action after — «[Agent]: "Where is the truth?" The agent slams the table» — the model often desyncs sound and motion. Correct order: physical action BEFORE the line. This gives the model a clear audio-visual anchor.
4. Dialogue without tonal descriptors
«[Agent]: "Don't move"» without tonal info delivers a flat TTS-like voice. Add voice characteristics: «[Agent, raspy deep voice, threatening]: "Don't move"». This unlocks Kling 3.0's native-audio advantage — emotional and tonal control.
5. Describing the scene in an I2V prompt
As in other Kling models, in Image-to-Video the model already sees the image and preserves its layout, text, and identity. Describing appearance or setting inside an I2V prompt conflicts with the actual picture. Length 20–40 words, ONLY motion and scene evolution.

Before / after examples

Example 1

Before

woman in a kitchen at night

After

A dim kitchen late at night, warm tungsten light from overhead fixture casting long shadows. [Character A: Sarah, mid-30s woman in a worn flannel bathrobe, exhausted eyes, hair in a messy ponytail]. Sarah sets a ceramic plate down too hard on the granite counter. Sharp ceramic clink. She turns slowly from the sink, exhaustion evident in her posture. Camera holds steady medium shot at eye level, shallow depth of field with soft bokeh from the kitchen window behind her. Quiet ambient sound of a ticking wall clock, distant traffic muffled through the window.

Full structure: scene setup + atmosphere, an anchored character up front, visual anchoring before sound, camera described separately from subject, audio design (clink, ticking clock, traffic).

Example 2

Before

Multi-shot: «joker dances down stairs»

After

Master Prompt: A theatrical figure begins his iconic dance descent down a long flight of concrete stairs in a moody urban setting at dusk, cinematic gritty atmosphere.

Multi shot Prompt 1: A man in a vibrant red suit starts dancing at the top of the stairs, taking first exaggerated steps down, arms spreading wide, head tilting back in laughter, wide shot from below capturing the full silhouette against amber sky (Duration: 5 seconds).

Multi shot Prompt 2: Continuing wild dance down concrete steps, spinning and kicking, coat tails flapping dramatically, reaching the bottom step with triumphant arms-raised pose, medium tracking shot following from the side with smooth dolly motion (Duration: 5 seconds).

Pure Multi-shot: Master Prompt sets the overall context, two marked shots with their own framing, action, and duration.

Example 3

Before

dialogue between an agent and an assistant

After

A tense interrogation room with a single overhead bulb casting harsh shadows. [Character A: Agent, black-suited man in his 40s with closely-cropped grey hair]. [Character B: Assistant, young woman in a beige sweater, nervous expression]. The agent slams his hand on the metal table, the impact echoes sharply. [Agent, raspy deep voice, cold]: "Where is the truth?" The assistant flinches, looks down at her trembling hands. Immediately, [Assistant, clear fearful voice]: "I told you everything I know." Camera holds tight medium shot, shallow depth of field, dim tungsten lighting with hard shadows.

Dialogue by all the rules: unique character labels, visual action before each line, tonal voice descriptors, the linking word «Immediately» between lines.

Kling 3.0: how to write prompts the model actually understands

What's new in Kling 3.0

Prompt structure as a director's script

Multi-shot: up to 6 shots in one generation

Dialogue with native audio

Common mistakes

1. Pronouns instead of character labels in dialogue

2. Multi-shot without marking individual shots

3. Dialogue without visual anchoring

4. Dialogue without tonal descriptors

5. Describing the scene in an I2V prompt

Before / after examples

Frequently asked

Related models

Google Veo 3.1 (incl. Veo 3.1 Fast and Veo 3.1 Fast Relax)

Google Veo 3

Google Veo (General)

Ready to write Kling 3.0 prompts in one click?