Can I use Act-Two with text prompts only?

No, Act-Two is a performance transfer model, not text-to-video. Without a driving video and a character reference it cannot physically generate. If you need to produce video from a text description, use Runway Gen-4.5 — it supports full T2V. Act-Two is for when you already have a performance or plan to record one.

What's the difference between driving video and character reference?

Driving video is the actor's performance recording with motion and expression — the source of what gets transferred. Character reference is an image or video of the character it transfers onto. Driving sets HOW to move, reference sets HOW to look. Both are needed simultaneously; without either, Act-Two does not work.

What Facial Expressiveness value should I use?

Default is 3, Runway's recommended value for most scenes. 1–2 yields restrained expression, suits documentary tone. 4 is worth trying for dramatic scenes but artifacts are likely. 5 almost always produces a «melting» face and is not recommended. If artifacts appear, lower Expressiveness — don't try to compensate with the prompt.

Can I get gesture control?

Yes, but only if the character reference is provided as an image (not a video). Image mode unlocks extra hand-gesture transfer control. For best capture, start the driving video with palms toward the camera — this helps the model lock onto the hands and transfer gestures accurately throughout the clip.

Is Act-Two good for dubbing into another language?

Yes, it's one of the strongest use cases. Driving video is a new recording with speech in the target language (your own voice works); character reference is an image or frame from the original video. Act-Two transfers lip-sync to match the new language while preserving the character's appearance. Lip-sync quality depends on clean audio in the driving video.

What are Act-Two's duration limits?

Output duration matches the driving video — the model transfers frame by frame. The longer the driving, the more credits you spend (5 credits/sec). For short lines and micro-scenes that's economical; for multi-minute monologues it's better to split into separate generations.

Does Opten support Runway Act-Two?

Yes, the Opten extension recognizes Act-Two inside runwayml.com and accounts for its input-driven nature: if you write a detailed text prompt, Opten warns that the model is controlled by video input, not text. It also checks that both inputs are present (driving video + character reference) and that Facial Expressiveness is set sensibly.

Video

Runway Act-Two: how to prepare inputs the model actually understands

Name: Runway Act-Two
Brand: Runway

Runway · Updated: May 19, 2026

Runway Act-Two is a performance transfer model, not text-to-video. You feed it a driving video with an actor's performance and a character reference (image or video), and the model transfers body motion, facial expression, and lip-sync onto the character. Text prompts play a minimal role here — quality is set by the inputs.

What Act-Two does well

Act-Two works like AI motion capture without mocap suits: record an actor's performance on a regular webcam, pick a character reference, and the model transfers body motion, facial expression, and audio lip-sync onto that character. Output is 720p video at 5 credits/sec.

This is a fundamentally different class of model — neither T2V nor I2V. The text prompt barely influences the result. The Facial Expressiveness parameter (1–5 scale) controls how strongly facial motion transfers — values above 3 risk artifacts. If the character reference is an image (not video), you also get gesture control.

Performance transfer — NOT text-to-video and NOT prompt-driven
Driving video + character reference are mandatory
Transfers: body motion, facial expression, lip-sync (audio)
Facial Expressiveness 1–5 (above 3 risks artifacts)
720p, 5 credits/sec

What to feed as input

Driving video — your performance footage. Can be a webcam recording or a prepared clip. Key requirements: even lighting on the face without harsh shadows, clear audio for lip-sync, and ideally start the frame with palms toward the camera — this helps the model capture the hands and later transfer gestures more accurately.

Character reference — who to transfer the performance onto. Can be a still image or a short video. An image unlocks gesture control (extra hand control); a video gives better facial consistency on longer scenes. In both cases the lighting and pose should be clear, the face unobstructed.

The role of the text prompt

Act-Two is input-driven. The text prompt plays a minimal, almost decorative role. Everything you'd normally describe in a prompt (movements, expression, lip-sync) here comes from the driving video; everything about appearance (clothing, face, background) comes from the character reference.

If you write a detailed prompt like «a man in a suit, walking, smiling, saying hello», it will either be ignored or conflict with the inputs. If you want specific movements, act them out in the driving video. If you want a specific look, pick the right character reference. Leave the prompt empty or only briefly describe the scene context.

Tuning Facial Expressiveness

The 1–5 scale controls how strongly facial expression transfers. Value 1–2 — calm, restrained expression with minimal artifact risk. Value 3 — recommended default, transfers most expressions naturally. Value 4–5 — maximum expression, but artifact risk rises non-linearly: the face can melt, eyes can twitch, expressions can look overdone.

Rule: start at 3, raise only if the result looks visibly flat. For dramatic scenes 4 can work, but problems usually start above that. If artifacts appear, lower Expressiveness — don't try to compensate with the prompt.

Common mistakes

1. Detailed text prompt as primary control
Act-Two is input-driven, not prompt-driven. Describing movements and expression in the prompt is either ignored or conflicts with the driving video. If you want specific motion, act it out in front of the camera. Leave the prompt empty or include only a brief scene context.
2. Missing driving video or character reference
Act-Two physically cannot run without both inputs. Driving video sets the performance, character reference picks who gets animated. If you launch missing one of them, generation either won't start or produces garbage. Verify both slots in Generation Settings before running.
3. Facial Expressiveness above 3 by default
Values 4–5 can deliver striking expression, but artifact risk grows non-linearly: face melts, eyes twitch, expression looks overdone. Always start at 3, raise only if the output is clearly flat. Lowering Expressiveness is a better fix for artifacts than regenerating.
4. Dark or noisy driving video
Harsh facial shadows break face tracking; noisy audio breaks lip-sync. The performance should be shot in even soft lighting (window, softbox) with clean audio. No prompt optimization can fix this — reshooting the driving video is always faster and more effective.
5. Using Act-Two like a generic T2V or I2V model
Act-Two is a performance transfer system, not a scene generator. Prompts like «a man walks across the room» don't work here because motion isn't generated — it's copied from the driving video. If you need a scene generator, use Gen-4.5 or Gen-4, not Act-Two.

Before / after examples

Example 1

Before

Detailed text prompt: «A young woman in a red sweater speaks to the camera, smiling warmly, gesturing with her hands as she explains a new product.»

After

Driving video: 15-second webcam recording, actress delivers the line clearly, palms toward camera at the start of the frame, even lighting.
Character reference: portrait image of the character in a red sweater.
Prompt: (empty or brief: «product explainer scene»).
Facial Expressiveness: 3.

Text prompts in Act-Two are useless for controlling motion and expression — those transfer from the driving video. Replace the prompt with a quality performance recording.

Example 2

Before

Character reference: dramatic painted portrait, Facial Expressiveness: 5

After

Character reference: clear photo or live video reference of the character, even lighting, face unobstructed.
Facial Expressiveness: 3.

Painted or stylized references transfer expression poorly. Expressiveness 5 on any reference almost guarantees artifacts. Drop to 3, pick a clear reference — the result stabilizes.

Example 3

Before

Driving video: dark recording with harsh shadows, noisy audio

After

Driving video: recording in even light (natural window light or soft box), clean audio without noise, palms visible at the start of the frame.
Character reference + Expressiveness 3.

Driving video quality directly determines transfer quality. Harsh shadows break face tracking, noisy audio breaks lip-sync. Reshooting the performance is the best «prompt optimization» in Act-Two.

Runway Act-Two: how to prepare inputs the model actually understands

What Act-Two does well

What to feed as input

The role of the text prompt

Tuning Facial Expressiveness

Common mistakes

1. Detailed text prompt as primary control

2. Missing driving video or character reference

3. Facial Expressiveness above 3 by default

4. Dark or noisy driving video

5. Using Act-Two like a generic T2V or I2V model

Before / after examples

Frequently asked

Related models

Google Veo 3.1 (incl. Veo 3.1 Fast and Veo 3.1 Fast Relax)

Google Veo 3

Google Veo (General)

Ready to write Runway Act-Two prompts in one click?