Video

Runway Gen-4.5: how to write prompts the model actually understands

Runway · Updated:

Runway Gen-4.5 is Runway's first model with full text-to-video alongside image-to-video. The Autoregressive-to-Diffusion architecture improves physics (water, fabric, momentum), supports flexible 2–10 second duration, and adds a unique timestamp syntax for sequential beats. Negative prompts and JSON are not supported.

What's new in Gen-4.5

The main difference from Gen-4 is native T2V mode. Scenes are built straight from text, no reference required. I2V is still there and follows the same rules as in Gen-4: describe movement only.

Second, duration is flexible from 2 to 10 seconds (not just 5 or 10), with a choice of 24 or 25 fps. Third, the timestamp syntax `[00:01]`, `[00:03]` lets you direct sequential actions with second-level precision. And fourth, physics for liquids, fabric, and momentum is markedly better — water splashes, flowing fabric, and settling dust look more convincing than in Gen-4.

  • T2V + I2V in one model — pick by scenario
  • Flexible duration 2–10 seconds (24 or 25 fps)
  • Timestamp syntax: `[00:01]`, `[00:03]` for second-level beats
  • Better physics for water, fabric, and particles
  • T2V aspect: only 1280:720 and 720:1280; I2V has many options

Prompt structure for T2V

In T2V mode you describe the full scene: camera, subject, action, environment. The base formula is [Camera] shot of [subject] [action] in [environment]. [Supporting descriptions].

Lead with the shot type and camera move: «Wide tracking shot of a runner sprinting across a misty beach at sunrise.» That locks composition immediately. The environment gives the model the mood — lighting, atmosphere, textures.

Max prompt length is unofficially around 1,800 characters. Natural language beats tags or JSON. Active verbs in present tense, concrete physical detail like «water splashing», «fabric draping», «dust settling».

Prompt structure for I2V

In I2V mode the rules are the same as in Gen-4: the input image sets the visual, the prompt describes ONLY movement. No need to describe the dress, the park, or the lighting — that's already in the reference. Just say what should move and how the camera should travel.

Optimal I2V prompt length is 10–30 words. Describing the reference content wastes tokens and sometimes conflicts with the actual image. Active verbs, one main camera move, optional speed modifier — «slowly», «gradually», «suddenly».

Timestamp syntax for sequential scenes

The unique Gen-4.5 feature is timecode direction. Format: `[00:01] action`, `[00:03] next action`. This is the best way to build a mini-narrative from several actions inside a 5–10 second clip.

The key rule — timecodes must be realistic. «Walking across the room in 0.5 seconds» is physically impossible, and the model will either break it or ruin the dynamics. Give beats room to breathe: 2–3 seconds for a full action, 1 second for a short gesture or cut.

Example: `[00:01] A bird takes off from a branch. [00:03] It soars over a misty valley. [00:06] Camera pulls back to reveal the full mountain range.`

Common mistakes

  1. 1. Mixing T2V and I2V logic

    In T2V you describe the whole scene; in I2V you describe only movement. If you describe «a woman in red in a park» in I2V, the model will try to match the text to the reference and sometimes drift. If you forget the environment and subject in T2V, you get «just a camera move» with no content. Know which mode you're in and write accordingly.

  2. 2. Unrealistic timecodes

    `[00:01] walks across the room [00:02] sits down [00:03] picks up the cup` is unrealistic for 3 seconds of screen time. The model will either speed motion to an unnatural pace or skip some beats. Give each action 2–3 seconds of breathing room, short gestures 1 second.

  3. 3. Negative prompts

    Gen-4.5 doesn't support negative prompts — it's a documented limitation. «No clouds» can produce clouds, «without text» can add text. Describe what you want positively: instead of «no fog» write «clear visibility», instead of «no jitter» write «smooth steadicam motion».

  4. 4. Aspect ratio mismatch in T2V

    T2V in Gen-4.5 supports only two aspect ratios: 1280:720 (landscape) and 720:1280 (portrait). Requests like «square 1:1» or «21:9 ultrawide» won't render in T2V. I2V is more flexible — many landscape/portrait/square options because aspect comes from the input image.

  5. 5. JSON and command style

    Structures like `{"camera": "dolly", "action": "walk"}` or commands like «add rain», «remove the hat», «pretend you are a director» don't work in Gen-4.5. Write in natural language, full sentences. Good: «Light rain begins to fall as the camera pulls back.» Bad: «add: rain. camera: pull back.»

Before / after examples

Example 1

Before

beautiful cinematic ocean sunset video with waves

After

Wide cinematic shot of waves rolling onto a black sand beach at sunset. Slow dolly-in toward the foam line. Warm orange and deep purple sky reflected on the wet sand. Soft, deliberate pacing, natural light, 24fps.

T2V prompt builds the whole scene: shot type, environment, camera move, color palette, fps. Active verbs in present tense, no metaphors.

Example 2

Before

bring this photo to life and make something dramatic

After

Slow push-in toward the subject. Wind picks up gradually, lifting her hair and the edges of her coat. Camera stays steady, shallow depth maintained on the eyes.

I2V mode — the prompt describes only movement and atmosphere, not reference content. Physical detail (wind lifting hair, coat edges) yields a convincing 5-second mini-story.

Example 3

Before

video where a man walks into a cafe and sits at a table

After

[00:01] Wide shot, a man pushes open the cafe door, late afternoon light streaming in. [00:04] He walks across the wooden floor toward a corner table. [00:07] He pulls out the chair and sits down, exhales slowly. Camera follows at chest height, smooth steadicam.

Timestamp syntax splits a 10-second narrative into three beats with realistic pacing — 3 seconds per action. This is the Gen-4.5 sweet spot.

Frequently asked

How is Gen-4.5 different from Gen-4?
Four major differences: native T2V mode (Gen-4 is I2V only), flexible 2–10 second duration versus fixed 5/10, timestamp syntax for second-level beats, and markedly better physics for water, fabric, and momentum. The architecture is also new — Autoregressive-to-Diffusion instead of pure diffusion. For most tasks it's an upgrade, except when you need minimal cost on simple I2V.
When should I use T2V versus I2V?
T2V — when there's no scene yet and you want to generate everything from text: concept videos, mini-narratives, prototypes. I2V — when you have a concrete reference (product, portrait, location) and want to animate it. I2V gives more visual control; T2V gives more creative freedom. You can confidently combine both modes in the same project on a single model.
How do I use the timestamp syntax correctly?
Format `[00:01] action`, `[00:03] next action` with realistic timecodes. Give each full action 2–3 seconds, short gestures 1 second. Don't stack more than 3–4 beats into a 10-second clip. This is a tool for sequential narrative, not for cramming maximum events into minimum time.
Which aspect ratios are supported?
T2V supports only two: 1280:720 (landscape) and 720:1280 (portrait) — other ratios won't render. I2V is more flexible: aspect comes from the input image, and there are many landscape, portrait, and square options. FPS choice is 24 or 25 for both modes.
Can I exceed 10 seconds in one generation?
No, the single-generation cap in Gen-4.5 is 10 seconds (minimum 2). For longer narratives you stitch multiple generations, using the last frame of one as the input image for the next. That's manual work, but it gets you production-grade narratives of 30+ seconds.
How long should the prompt be?
For I2V — 10–30 words, same as Gen-4. For T2V — longer, up to ~1,800 characters unofficially, because you need to describe the scene in full. Timestamp prompts naturally grow due to multiple beats. The rule is concentration of meaning, not volume: every sentence should carry a physical or visual action.
Does Opten support Runway Gen-4.5?
Yes, the Opten extension auto-detects Runway inside runwayml.com and scores prompts against the structure specific to Gen-4.5: it checks alignment with T2V or I2V mode, realism of timecodes in timestamp prompts, and the absence of negative constructions and JSON. One click gives you a rewrite restructured for the selected mode.

Related models

Ready to write Runway Gen-4.5 prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672