Why should the prompt be long?

LTX 2 has a unique correlation: prompt length = video duration. A short prompt for a 10-second clip causes «rushing» — the model crams everything into the first seconds. For 10 seconds you need ~200 words of chronological description with progression. This is a fundamental difference from Kling and other video models, where 50–150 words is optimal.

How do I use lens and aperture language?

Lens and aperture language («50mm, f/2.8», «35mm at f/2.0», «100mm macro at f/4») reduces LTX 2-specific artifacts: edge flicker and temporal jitter. Specify a concrete lens, aperture, and focal length — it gives the model a clear visual anchor and improves frame stability. Additionally, explicit camera paths («dolly forward», «orbit», «crane up») reduce camera jitter.

How does Audio-to-Video work in Pro?

A2V is an exclusive Pro mode: the user uploads an audio track (music, speech, effects), and the model syncs visuals to the rhythm, tone, and dynamics of the sound. Useful for music videos, ASMR visualizations, product demos with voice-over. The prompt describes the visual content; timing and rhythm come from the audio. A unique capability among video models — most others can only generate audio, not consume it as input.

Why add «no high-frequency patterns» to the negative?

When generating 4K (2160p), thin stripes, dense grids, and small repeating textures can cause moiré artifacts — characteristic «wavy» distortions. The negative prompt «no high-frequency patterns, no moiré, no aliasing» is insurance specific to high resolutions. At 1080p this guardrail isn't required, but it doesn't hurt.

Can I run LTX 2 locally?

Yes, LTX 2 is open source with full weights on HuggingFace and ComfyUI support. Apache-style license (free for projects with revenue under $10M). For consumer hardware, the Fast version fits thanks to 1/10 compute cost. LoRA fine-tuning is supported for custom styles and motion — typically under 1 hour. The alternative is cloud execution via ltx.io or LTX Studio.

Video

LTX 2: how to write prompts the model actually understands

Name: LTX 2 (Fast / Pro)
Brand: LTX

LTX · Updated: May 19, 2026

LTX 2 is Lightricks' open-source video model at ltx.io. It comes in two versions: Fast (up to 20 seconds, 2x faster) and Pro (up to 10 seconds, plus Audio-to-Video and Retake). Native 4K up to 50 FPS, audio generation, Apache license. The prompt is written as a cinematographer's shot list — optimal length around 200 words in English.

What LTX 2 does well

LTX 2 is an open-source video model built on a Diffusion Transformer (DiT) architecture. Key technical advantages: native 4K (2160p) up to 50 FPS — the highest resolution among surveyed models; native audio generation (dialogue, music, ambient, SFX) in sync with video; full weights on HuggingFace under an Apache license; LoRA fine-tuning support for custom styles and motion.

Two versions solve different problems. LTX 2 Fast — up to 20 seconds, 2x faster, 1/10 the compute cost, optimal for prototyping and long tests. LTX 2 Pro — up to 10 seconds with exclusive modes: Audio-to-Video (video generation from an audio track), Retake (regenerate a segment without restarting), Extend. Negative prompts are supported in both versions.

Native 4K (2160p) up to 50 FPS — a record among models
LTX 2 Fast: up to 20 seconds, 2x faster, 1/10 compute
LTX 2 Pro: up to 10 seconds, A2V, Retake, Extend
Native audio in sync with video
Open source, Apache license, LoRA fine-tuning

The 6-element prompt structure

The official Lightricks structure — write like a cinematographer's shot list, detailed chronological descriptions in paragraph form. Six elements:

1. Shot type / camera position — cinematic terms (wide shot, medium close-up, low-angle establishing). 2. Environment — lighting, color palette, textures, atmosphere. 3. Action — natural sequence in present-tense, start to finish. 4. Character details — age, hair, clothing, distinctive features. 5. Camera movement — how and when; describing post-movement helps. 6. Audio description — ambient, music, dialogue, vocals.

Not all six are mandatory for simple scenes, but the 6-element structure is the ideal for production work.

The key principle: prompt length = video length

A unique LTX 2 feature is the correlation between prompt length and video duration. A short prompt for a long video causes «rushing»: the model crams everything into the start and then has nothing left to do. For a 10-second video you need ~200 words of chronological description.

Lens/aperture language reduces artifacts: «50mm, f/2.8» cuts edge flicker. Explicit camera paths (dolly, crane, orbit) reduce temporal jitter — specify a concrete camera trajectory, not a generic «cinematic camera». When generating 4K, add «no high-frequency patterns» to the negative prompt — otherwise moiré artifacts can appear on textures.

For automatic prompt enhancement, the `enhance_prompt=True` flag is available — the model expands the description to optimal length on its own.

Common mistakes

1. Short prompt for a long video
A unique LTX 2 anti-pattern: prompt length should match video duration. A 10-word prompt for a 10-second clip causes «rushing» — the model crams everything into the start. For 10 seconds you need ~200 words of chronological description with progression from start to finish.
2. Conflicting descriptions
«Still peaceful lake with dramatic waves crashing», «bright sunny day with dark moody shadows» — internal contradictions. LTX 2 tries to reconcile the irreconcilable and outputs uncontrolled results. Keep the description stylistically consistent, or state temporal progression explicitly.
3. No audio description
LTX 2 generates audio natively, and describing the audio landscape significantly improves the result. Without an explicit description the model picks an «average» audio variant, often less expressive. Add a block — «Ambient sound of…», «Soft piano in the background…», «Character speaks in…» — it's a full sixth element of the 6-element structure.
4. High-frequency patterns in 4K without a negative guardrail
When generating 4K, high-frequency patterns (thin stripes, fine grids, dense textures) can cause moiré artifacts. Add «no high-frequency patterns, no moiré, no aliasing» to the negative prompt — insurance specific to 2K and higher resolutions.
5. Describing the image in I2V instead of motion
As in Kling, in Image-to-Video the model already sees the source image. Describing appearance, clothing, or setting inside an I2V prompt conflicts with the actual picture. Length 20–40 words, describe ONLY motion and scene evolution — what moves, how, and at what tempo.

Before / after examples

Example 1

Before

girl walks along the beach at sunset

After

Wide establishing shot at golden hour. A long stretch of empty Pacific coast with warm amber sunlight bathing the wet sand, soft pastel pink and orange sky reflecting on shallow waves, low rolling fog at the horizon. A young woman in her late twenties with long auburn hair tied loosely, wearing a cream linen sundress and bare feet, walks slowly from the right side of the frame toward the receding waves. She pauses, lifts her face to the sun, then continues walking parallel to the shoreline. Camera follows her with a smooth tracking dolly from a medium distance, gradually pulling back to reveal the vastness of the coast by the end of the clip. Shot on 50mm lens at f/2.8, shallow depth of field with soft bokeh on the background. Gentle ambient sound of waves rolling in and seagulls in the distance, soft acoustic guitar melody fades in around the 4-second mark.

Full 6-element structure: shot type, environment, character, action, camera movement, audio. Length ~150 words for a 10-second video, lens language (50mm, f/2.8), chronological progression from start to finish.

Example 2

Before

foggy street scene

After

Medium low-angle tracking shot at pre-dawn blue hour. A narrow cobblestone alley in a European old town, dense morning fog drifts at ankle level, wet cobblestones reflecting muted blue light from antique street lamps, brick walls covered in ivy, deep shadows between buildings. A man in his forties wearing a long charcoal wool coat and grey fedora walks deliberately away from the camera into the fog, hands in pockets. Camera dollies forward at the same pace as the subject, maintaining constant distance for the first 5 seconds, then gradually slows as he disappears into the fog. 35mm lens at f/2.0, anamorphic flares from street lamps, film grain texture. Ambient sound of distant church bells and faint footsteps on wet stone, a low cello drone gradually builds tension throughout the clip.

Lens/aperture language (35mm, f/2.0), explicit camera path (dolly forward, gradually slows), chronological rhythm with timestamps («for the first 5 seconds», «throughout the clip»), full audio design.

Example 3

Before

watch product shot

After

Macro close-up product shot in studio. A premium stainless-steel automatic watch with sapphire crystal face, exposed mechanical movement visible through the case, dark navy leather strap with white stitching, placed on a black slate surface with soft directional rim lighting from the right. Camera orbits slowly around the watch at the same elevation, completing a quarter turn over the duration of the clip, revealing different angles of the case and dial. Shot on 100mm macro lens at f/4, razor-sharp focus on the mechanical movement, soft falloff into the background. Subtle ambient sound of the mechanical tick-tock of the watch movement clearly audible, distant soft piano in the background. No high-frequency patterns.

4K product scene with a negative guardrail («no high-frequency patterns» against moiré), explicit camera path (orbit, quarter turn), lens language (100mm macro, f/4), audio description to emphasize mechanics.

LTX 2: how to write prompts the model actually understands

What LTX 2 does well

The 6-element prompt structure

The key principle: prompt length = video length

Common mistakes

1. Short prompt for a long video

2. Conflicting descriptions

3. No audio description

4. High-frequency patterns in 4K without a negative guardrail

5. Describing the image in I2V instead of motion

Before / after examples

Frequently asked

Related models

Google Veo 3.1 (incl. Veo 3.1 Fast and Veo 3.1 Fast Relax)

Google Veo 3

Google Veo (General)

Ready to write LTX 2 (Fast / Pro) prompts in one click?