Runway Gen-4: how to write prompts the model actually understands
Runway · Updated:
Runway Gen-4 is an image-to-video model from Runway with native 720p (upscale to 4K) and a fixed duration of 5 or 10 seconds. Generation cannot run without an input image — Gen-4 is I2V-only. The prompt describes ONLY movement and camera; the visual is locked by the reference. Negative prompts and JSON are ignored.
What Runway Gen-4 does well
Gen-4 is a dedicated I2V model: it always needs an input image, and you don't have to describe the scene — that's already fixed in the frame. Strengths are cinematic camera moves and animating still photos with subtle physical detail (hair in a breeze, fabric folds, small gestures).
Gen-4 Turbo is the lighter tier at 5 credits/sec instead of 12. Use it for prototyping and quick iteration, then finalize on full Gen-4. Turbo tolerates slightly less detailed prompts.
- Image-to-Video only — no reference, no generation
- 720p native, upscale to 4K at the final step
- Duration 5 or 10 seconds (fixed choices)
- 12 credits/sec (Gen-4) or 5 credits/sec (Gen-4 Turbo)
- No support for negative prompts or JSON formatting
Prompt structure
Because the visual is already set by the image, the prompt describes movement only. The base formula is [Camera movement]: [subject] [action]. [Additional motion details].
Optimal length is 10–30 words. A short prompt (10–15 words) often beats a long one — Runway officially says: «Clarity matters more than structure». No greetings, explanations, JSON, or commands like «add rain».
Active verbs in present tense: «walks», «pulls back», «rotates slowly». One clear camera move beats several simultaneous ones — Gen-4 struggles with zoom + pan + orbit combinations in a single scene.
Camera vocabulary
Gen-4 understands the standard cinematic lexicon well because it was literally pulled from the training data. Basic moves: dolly in/out, truck left/right, pan left/right, tilt up/down. Advanced: crane shot, arc shot, whip pan, crash zoom, push-in, pull-out. Camera style: handheld, steadicam, gimbal, smooth tracking, static.
Set one main move plus an optional speed modifier — «slowly», «suddenly», «gradually». This controls pacing without overloading the model.
Turbo vs Gen-4: when to use which
Turbo costs 5 credits/sec and renders faster — ideal for trying camera moves, exploring variations, A/B-testing ideas. Full Gen-4 is the final render once the movement and timing are confirmed.
Practical pipeline: 3–5 iterations on Turbo (40–50 credits for a 10-second clip), then one final render on Gen-4 (120 credits). That's 2–3× cheaper than iterating directly on the full model. For production campaigns with dozens of clips the budget difference adds up fast.
Common mistakes
1. Running without an input image
Gen-4 is an I2V-only model — generation is physically impossible without a reference. This is not a bug or a workaround target; the architecture has no T2V mode. If you need text-to-video on Runway, use Gen-4.5. Always confirm there's an image attached in Generation Settings before launching.
2. Describing the scene instead of the movement
A prompt like «a woman in a red dress in a park, sunset, beautiful» is useless — that information is already in the reference. The prompt should start with a movement verb or a camera move type. The scene is locked by the image; your prompt is the operator's instruction for what to shoot next.
3. Negative prompts
«No clouds», «no blur», «without text» in Gen-4 can produce exactly what you tried to exclude — the model sees «clouds», «blur», «text» as tokens and sometimes generates them. Describe what you want positively: instead of «no fast motion» write «slow, deliberate movement».
4. Multiple camera moves at once
«Pan left while zooming in and rotating» comes out as undirected camera drift in Gen-4. Pick one main move (dolly in OR pan OR orbit) plus an optional speed modifier. Five to ten seconds is not enough screen time for complex blocking — the model can't fit it cleanly.
5. JSON formatting and command-style prompts
Structures like `{"camera": "dolly", "subject": "woman"}` or commands like «add rain», «remove the hat» are ignored by Runway — it's not a command-driven model. Write in natural language, full sentences: «Light rain begins to fall as the camera slowly pulls back.»
Before / after examples
Example 1
Before
make a nice video with this photo where the woman in red is in the park smiling, add some motion
After
Slow dolly-in toward the subject. The woman gently tilts her head and smiles softly. Subtle hair movement in the breeze. Smooth tracking, cinematic pacing.
The old version describes the reference (dress, park); the new one describes only movement and camera. Active verbs in present tense, one camera move, soft physical detail.
Example 2
Before
make a dynamic product video from different angles
After
Slow orbital arc shot around the product, 180-degree sweep. Subtle product highlights catch the light as the camera moves. Smooth steadicam motion, no jitter.
A concrete camera move (orbital arc, 180°) instead of vague «different angles». Stabilization is specified — this yields a clean commercial render instead of jittery output.
Example 3
Before
bring this portrait to life, add emotion, no background blur
After
Slight head turn to the left. The subject blinks once, then breaks into a soft smile. Static camera, shallow depth maintained on the eyes.
Removed the negative «no blur» — it doesn't work in Gen-4. Replaced with positive «shallow depth maintained». Micro-gestures (blink, smile) are a strength of I2V.