MiniMax Hailuo 02: how to write prompts the model actually understands
MiniMax · Updated:
MiniMax Hailuo 02 is the predecessor of Hailuo 2.3, still relevant for its unique FL2V (First-and-Last-Frame-to-Video) mode and strong physics on extreme motion. Prompts are written as director's notes; bracket camera syntax `[Push in]` is supported. English is preferred; optimal length 40-60 words.
What Hailuo 02 does
Hailuo 02 is MiniMax's older video model, but not «outdated.» It has two unique aces that newer 2.3 doesn't have.
First — FL2V (First-and-Last-Frame-to-Video) mode: the model takes TWO frames (start and end) and generates a smooth transition between them. Indispensable for morphing, seasonal transformations (summer → winter), state changes of an object.
Second — extreme physics: gymnastics, parkour, acrobatics, complex physical motion. On scenes like that, 02 delivers more realistic dynamics than 2.3. Plus 512P support for rapid prototyping. For everything else — standard T2V and I2V — pick 2.3.
- FL2V — unique first-and-last-frame mode
- Extreme physics: gymnastics, parkour, acrobatics
- Resolutions: 512P, 768P (default), 1080P
- Duration: 6s or 10s (at 512P/768P); 6s at 1080P
- Bracket camera syntax `[Push in]`, `[Tracking shot]`, up to 3 combined commands
Prompt structure
Style matches Hailuo 2.3 — director's notes in natural language, not tags. Optimal length 40-60 words, max 2000 characters.
Formula: [Camera + motion] + [Subject + description] + [Action in present tense] + [Style and atmosphere] + [Emotional markers].
Example: «[Push in] A young woman in a flowing red dress spins gracefully on a moonlit terrace, her hair catching the breeze. Cinematic, dreamlike atmosphere, soft warm rim light, serene emotional tone.» Verbs in present tense («spins», «catches»), brand semantics «[Push in]» — bracket syntax works.
FL2V — the unique mode
The headline feature of Hailuo 02. It takes two frames: first = the starting state of the scene, last = the ending state. The model generates a smooth transition. This is a different prompting style — not a scene description, but a description of the TRANSITION process.
Good FL2V prompt: «The flower gradually blooms, petals slowly unfurling outward, camera holding steady on a close-up.» Bad — describing the contents of the first or last frame (they're already set by images). Specify the transition character: smooth, abrupt, gradual. Specify camera behavior during the transition. If FL2V is enabled in settings but the second frame is missing — that's a critical error; the model can't generate.
Bracket Camera Syntax
Hailuo 02 supports the same syntax as 2.3 — precise cinematic control through square brackets. Core commands: `[Truck left]`, `[Truck right]` (horizontal trucking); `[Pan left]`, `[Pan right]` (panning); `[Push in]`, `[Pull out]` (in/out); `[Pedestal up]`, `[Pedestal down]` (camera height); `[Tilt up]`, `[Tilt down]` (tilt); `[Zoom in]`, `[Zoom out]` (zoom); `[Shake]`; `[Tracking shot]`; `[Static shot]`.
Combination: `[Pan left,Pedestal up]` — up to 3 simultaneous commands. Sequential: «...[Push in], then...[Pull out].» This is a model feature, not a formatting error — bracket syntax activates direct camera control.
Common mistakes
1. Tag-based prompts instead of sentences
«cyberpunk, rain, neon, 4k» — Hailuo 02 was trained on narrative descriptions. Tag soup yields generic results with unpredictable dynamics. Write director's notes: «[Push in] Neon-lit Tokyo street, heavy rain falling on wet asphalt, lone figure walking through reflections.»
2. Quality boosters like «8k masterpiece»
«ultra-detailed, 8k, masterpiece, best quality» cause excessive saturation and contrast in the final video. Quality is determined by scene, motion, and camera specificity — not magic tokens. On Hailuo 02 quality spam especially breaks motion physics.
3. Describing first/last frame contents in FL2V
If FL2V is on, the first and last frames are defined by images — don't describe them. The prompt must describe the TRANSITION PROCESS between them: motion character, camera behavior, tempo. Restating frame contents wastes tokens and confuses the model.
4. FL2V without a second reference
FL2V requires TWO images — first and last frame. If FL2V is selected in settings but only one or no image is loaded, that's a critical error and the model can't generate the transition. Before using FL2V, make sure both references are uploaded.
5. Using 02 when 2.3 is the right choice
Hailuo 02 is the older model. If the task is standard (T2V or I2V without FL2V, without extreme physics), 2.3 is better: newer, more precise, with a cheaper Fast version. 02 only makes sense for FL2V, sports physics, or quick 512P tests. For most tasks — 2.3 is the right call.
Before / after examples
Example 1
Before
beautiful sunset turns into night
After
[FL2V mode, frame 1: golden sunset over the ocean; frame 2: deep blue night with stars]. The sky gradually transitions from warm golden tones to deep indigo, sun slowly sinking below the horizon, first stars beginning to twinkle. Camera holds steady on the wide horizon. Smooth, gradual atmospheric shift, peaceful contemplative mood.
An FL2V prompt describes the PROCESS of transition, not the frames (they're set by images). Transition character (gradual, smooth), camera behavior (holds steady), and emotional tone are explicit.
Example 2
Before
gymnast does a flip
After
[Tracking shot] A young female gymnast in a white leotard performs a backflip on a sunlit gymnastics floor, body fully extended mid-air, sharp focus on her arched form. Realistic physics, smooth body mechanics, dynamic energy. Sports broadcast aesthetic, tense and energetic emotional tone.
Extreme physics is Hailuo 02's strength. The `[Tracking shot]` bracket keeps the camera on the motion. Present-tense verb, explicit physical markers (body fully extended, arched form).
Example 3
Before
cat jumps onto the table
After
[Static shot] A ginger cat crouches on the kitchen floor, tail flicking, then leaps gracefully onto the wooden countertop, landing softly. Natural daylight from the window, calm domestic atmosphere, slight cinematic tension during the leap.
Static camera for a predictable shot, concrete verbs (crouches, flicking, leaps, landing), landing physics described (softly). Not tag soup like «cat, jump, kitchen, 4K.»