MiniMax Hailuo 2.3: how to write prompts the model actually understands
MiniMax · Updated:
MiniMax Hailuo 2.3 is the flagship of MiniMax video models: T2V and I2V up to 1080P, 25fps, with bracket camera syntax `[Push in]` support. Prompts are written as director's notes in natural language, not tags. English is recommended; Chinese is the native training language. Optimal length 40-60 words.
What Hailuo 2.3 does
Hailuo 2.3 is newer and more precise than 02. Strengths: dance choreography and full action with realistic body mechanics (flips, jumps, fight scenes); facial micro-expressions with improved emotional precision; diverse art styles in one model (anime, ink wash, game CG, realism, watercolor, claymation); precise cinematic control via bracket camera syntax.
A Fast version exists — ~2× faster and ~50% cheaper than standard, but I2V only (no T2V). Supports 768P (default) and 1080P at 25fps. Duration 6s or 10s at 768P; 6s at 1080P. Max prompt length 2000 characters; a built-in prompt_optimizer is available.
- T2V + I2V; Fast version is I2V only
- Resolutions: 768P (default), 1080P, 25fps
- 15 bracket camera commands, up to 3 combined
- Strengths: dance, action, micro-expressions, style diversity
- prompt_optimizer (default true) — LLM rewrites your prompt
Prompt structure
Ideal length 40-60 words, max 2000 characters. Style — director's notes in natural language, NOT tags.
Formula: [Camera + motion] + [Subject + description] + [Action in present tense] + [Style and atmosphere] + [Emotional markers].
Example: «[Tracking shot] A young dancer in a flowing crimson dress spins gracefully across a moonlit rooftop, hair catching the breeze, arms extended. Cinematic, dreamlike atmosphere, soft warm rim light, serene yet powerful emotional tone.» Concrete present-tense verbs («spins», «catching», «extended»), bracket camera command up front, emotional anchor at the end.
Bracket Camera Syntax — 15 commands
The headline MiniMax-family feature — precise cinematic control through square brackets. 15 commands available:
`[Truck left]`, `[Truck right]` — horizontal trucking; `[Pan left]`, `[Pan right]`, `[Pan up]`, `[Pan down]` — panning; `[Push in]`, `[Pull out]` — in/out; `[Pedestal up]`, `[Pedestal down]` — camera height; `[Tilt up]`, `[Tilt down]` — tilt; `[Zoom in]`, `[Zoom out]` — zoom; `[Shake]`; `[Tracking shot]`; `[Static shot]`.
Combination: `[Pan left,Pedestal up]` — up to 3 simultaneous. Sequencing via connecting words: «...[Push in], then...[Pull out].» This is a model feature, not a formatting error. Without bracket syntax the camera behaves unpredictably.
Prompt Optimizer and its role
Hailuo 2.3 has a `prompt_optimizer` parameter (default true) — MiniMax's LLM rewrites/improves your prompt before generation. This explains why short, vague prompts often produce acceptable results — the optimizer fills in.
When to leave `true`: random ideas, quick tests, general tasks. The LLM will add camera commands, atmospheric details, emotional markers.
When to set `false`: production prompts, exact brief following, A/B tests. The model will follow your prompt literally, with no LLM intervention. If you wrote a detailed prompt with bracket syntax and emotional tone — turn the optimizer off so it doesn't «rewrite» your structure.
Common mistakes
1. Tag-based prompts
«cyberpunk, rain, neon, 4k, masterpiece» — Hailuo 2.3 was trained on narrative descriptions, not tags. Tag soup gives generic output with generic dynamics. Write director's notes with present-tense verbs and bracket camera commands. That alone doubles quality on the same set of words.
2. Quality boosters cause oversaturation
«ultra-detailed, 8k, masterpiece, best quality» cause excessive saturation and contrast in the final video. Hailuo 2.3 is sensitive to those tokens — they shift the color grade and can break motion physics. Quality comes from description specificity, not magic words.
3. Describing the image in I2V
In I2V mode the input image defines scene contents. The prompt should describe only MOTION and CHANGES. «Beautiful girl in red dress walks» with a photo of a girl in a red dress already loaded — empty tokens up to «walks.» Write shorter, focus on motion and camera.
4. Hailuo 2.3 Fast for T2V
The Fast version of 2.3 supports I2V ONLY — no T2V. If you want to generate from text without a reference image, use the standard 2.3 or 02. This is a known confusion: Fast looks like a «lite» version, but it's a different class — text-only prompts don't work in it.
5. More than 3 bracket commands at once
MiniMax supports combining (`[Pan left,Pedestal up]`), but max 3 at once. `[Pan left,Pedestal up,Push in,Tilt up]` overloads camera instructions, the model won't resolve the conflict and outputs chaos. For multiple moves — sequence via «then»: «[Pan left], then [Push in].»
Before / after examples
Example 1
Before
girl dancing in a red dress
After
[Tracking shot] A young woman in a flowing crimson silk dress performs a contemporary pirouette on a moonlit rooftop, arms extended, hair catching the breeze. Cinematic atmosphere with soft warm rim light from a single streetlamp, dreamlike serene tone, dynamic yet graceful tempo.
Bracket camera command `[Tracking shot]`, concrete dance term (contemporary pirouette), physical marker (arms extended), emotional tone. Length sits in the 40-60 word target range.
Example 2
Before
cool fight shot
After
[Pan right,Push in] A male martial artist in a black gi delivers a roundhouse kick mid-air on a dimly lit dojo, body fully rotated, focused intense expression. Cinematic action aesthetic, deep shadow contrast, tense and explosive emotional tone, realistic body mechanics.
Combined camera command (pan + push in simultaneously), concrete combat action (roundhouse kick), physical marker (body fully rotated), micro-expression (focused intense expression).
Example 3
Before
anime girl in the rain
After
[Static shot] An anime-style young woman with long black hair stands under a transparent umbrella on a neon-lit Tokyo street, looking up at the rain with a quiet melancholic smile. Soft watercolor textures, cool blue and magenta neon reflections on wet pavement, dreamy nostalgic tone.
Art style is explicit (anime-style + watercolor textures), micro-expression (quiet melancholic smile), static camera for portrait focus. Without an «anime» tag 2.3 may drift into realism.