Runway Gen-4.5: how to write prompts the model actually understands
Runway · Updated:
Runway Gen-4.5 is Runway's first model with full text-to-video alongside image-to-video. The Autoregressive-to-Diffusion architecture improves physics (water, fabric, momentum), supports flexible 2–10 second duration, and adds a unique timestamp syntax for sequential beats. Negative prompts and JSON are not supported.
What's new in Gen-4.5
The main difference from Gen-4 is native T2V mode. Scenes are built straight from text, no reference required. I2V is still there and follows the same rules as in Gen-4: describe movement only.
Second, duration is flexible from 2 to 10 seconds (not just 5 or 10), with a choice of 24 or 25 fps. Third, the timestamp syntax `[00:01]`, `[00:03]` lets you direct sequential actions with second-level precision. And fourth, physics for liquids, fabric, and momentum is markedly better — water splashes, flowing fabric, and settling dust look more convincing than in Gen-4.
- T2V + I2V in one model — pick by scenario
- Flexible duration 2–10 seconds (24 or 25 fps)
- Timestamp syntax: `[00:01]`, `[00:03]` for second-level beats
- Better physics for water, fabric, and particles
- T2V aspect: only 1280:720 and 720:1280; I2V has many options
Prompt structure for T2V
In T2V mode you describe the full scene: camera, subject, action, environment. The base formula is [Camera] shot of [subject] [action] in [environment]. [Supporting descriptions].
Lead with the shot type and camera move: «Wide tracking shot of a runner sprinting across a misty beach at sunrise.» That locks composition immediately. The environment gives the model the mood — lighting, atmosphere, textures.
Max prompt length is unofficially around 1,800 characters. Natural language beats tags or JSON. Active verbs in present tense, concrete physical detail like «water splashing», «fabric draping», «dust settling».
Prompt structure for I2V
In I2V mode the rules are the same as in Gen-4: the input image sets the visual, the prompt describes ONLY movement. No need to describe the dress, the park, or the lighting — that's already in the reference. Just say what should move and how the camera should travel.
Optimal I2V prompt length is 10–30 words. Describing the reference content wastes tokens and sometimes conflicts with the actual image. Active verbs, one main camera move, optional speed modifier — «slowly», «gradually», «suddenly».
Timestamp syntax for sequential scenes
The unique Gen-4.5 feature is timecode direction. Format: `[00:01] action`, `[00:03] next action`. This is the best way to build a mini-narrative from several actions inside a 5–10 second clip.
The key rule — timecodes must be realistic. «Walking across the room in 0.5 seconds» is physically impossible, and the model will either break it or ruin the dynamics. Give beats room to breathe: 2–3 seconds for a full action, 1 second for a short gesture or cut.
Example: `[00:01] A bird takes off from a branch. [00:03] It soars over a misty valley. [00:06] Camera pulls back to reveal the full mountain range.`
Common mistakes
1. Mixing T2V and I2V logic
In T2V you describe the whole scene; in I2V you describe only movement. If you describe «a woman in red in a park» in I2V, the model will try to match the text to the reference and sometimes drift. If you forget the environment and subject in T2V, you get «just a camera move» with no content. Know which mode you're in and write accordingly.
2. Unrealistic timecodes
`[00:01] walks across the room [00:02] sits down [00:03] picks up the cup` is unrealistic for 3 seconds of screen time. The model will either speed motion to an unnatural pace or skip some beats. Give each action 2–3 seconds of breathing room, short gestures 1 second.
3. Negative prompts
Gen-4.5 doesn't support negative prompts — it's a documented limitation. «No clouds» can produce clouds, «without text» can add text. Describe what you want positively: instead of «no fog» write «clear visibility», instead of «no jitter» write «smooth steadicam motion».
4. Aspect ratio mismatch in T2V
T2V in Gen-4.5 supports only two aspect ratios: 1280:720 (landscape) and 720:1280 (portrait). Requests like «square 1:1» or «21:9 ultrawide» won't render in T2V. I2V is more flexible — many landscape/portrait/square options because aspect comes from the input image.
5. JSON and command style
Structures like `{"camera": "dolly", "action": "walk"}` or commands like «add rain», «remove the hat», «pretend you are a director» don't work in Gen-4.5. Write in natural language, full sentences. Good: «Light rain begins to fall as the camera pulls back.» Bad: «add: rain. camera: pull back.»
Before / after examples
Example 1
Before
beautiful cinematic ocean sunset video with waves
After
Wide cinematic shot of waves rolling onto a black sand beach at sunset. Slow dolly-in toward the foam line. Warm orange and deep purple sky reflected on the wet sand. Soft, deliberate pacing, natural light, 24fps.
T2V prompt builds the whole scene: shot type, environment, camera move, color palette, fps. Active verbs in present tense, no metaphors.
Example 2
Before
bring this photo to life and make something dramatic
After
Slow push-in toward the subject. Wind picks up gradually, lifting her hair and the edges of her coat. Camera stays steady, shallow depth maintained on the eyes.
I2V mode — the prompt describes only movement and atmosphere, not reference content. Physical detail (wind lifting hair, coat edges) yields a convincing 5-second mini-story.
Example 3
Before
video where a man walks into a cafe and sits at a table
After
[00:01] Wide shot, a man pushes open the cafe door, late afternoon light streaming in. [00:04] He walks across the wooden floor toward a corner table. [00:07] He pulls out the chair and sits down, exhales slowly. Camera follows at chest height, smooth steadicam.
Timestamp syntax splits a 10-second narrative into three beats with realistic pacing — 3 seconds per action. This is the Gen-4.5 sweet spot.