Luma Uni-1: how to write prompts the model actually understands
Luma · Updated:
Luma Uni-1 is a Luma Labs image model with a unique architecture: decoder-only autoregressive transformer (NOT diffusion), generating pixels sequentially. Native 2K resolution, reasoning on by default (the model «thinks» about composition before rendering), up to 9 references with explicit roles (CHARACTER, STYLE, LIGHTING, etc.), strong multilingual support. Available via Luma Labs and REST API.
What sets Uni-1 apart
Uni-1 is not a diffusion model. The decoder-only autoregressive transformer architecture generates pixels sequentially, which gives it unique advantages: strong spatial-relationship retention, better behavior on multi-reference composites, support for storyboards with character consistency, and built-in reasoning that «thinks» about visual intent before rendering.
Practical consequences: Uni-1 is noticeably stronger than diffusion on complex composite prompts like «a dog on a red couch, back to viewer, in the left third of the frame, with a rainy city visible through the window». But Uni-1 is slower than diffusion (~40–50 seconds for 2K with reasoning) and not optimal for high-volume A/B tests.
- Decoder-only autoregressive, not diffusion
- Native 2K without upscale
- Up to 9 references with explicit roles
- Reasoning on by default (intent_weight on the API)
- Storyboard mode with character consistency
Prompt structure and templates
Universal Fast Start template (covers 80% of tasks): `A [subject], in [style], with [lighting], [camera/composition], [environment/background], mood: [emotion], details: [key specifics]`.
Example: «A ceramic artist shaping a lopsided bowl, documentary photography style, soft window lighting, close-up shot, cluttered home studio background, mood: focused and quiet, details: clay-covered hands, imperfect texture, tools scattered on wooden table».
Uni-1 expects coherent sentences or explicit sections, not tag soup. The model responds more strongly to structured prompts with explicit sections (Subject, Style, Lighting, Camera, Mood, Details) than to raw prose. Simple comma-separated tags work worse here than they do in diffusion models.
Uni-1's 8 templates
The Uni-1 Field Guide defines 8 templates for different tasks. Fast Start — for most tasks, exploration, and first ideas. Cinematic Control — structured cinematic brief with separate Subject/Style/Scene/Camera/Details blocks. Direct Edit — surgical editing in Modify mode with an explicit Keep block.
Multi-Reference Fusion — combining 2–9 references with roles (IMAGE 1: use as CHARACTER, IMAGE 2: use for STYLE, IMAGE 3: use as LIGHTING). Layout Control — direct zone specification (LEFT / CENTER / RIGHT / BACKGROUND) with objects in each. Storyboard Generator — storyboard with character consistency across multiple frames.
Loose / Creative Mode — vibe fragments for early ideation («fog, dust, morning, silence»). Structured JSON — for developers, formal structure. Template choice depends on the task: mode beats tags.
Reference roles and the Keep block
Uni-1's headline feature is assigning roles to references. Available roles: CHARACTER, STYLE, LIGHTING, COMPOSITION, OUTFIT, BACKGROUND, POSE. One role per reference — this reduces conflict between references.
Example: «Combine the following: IMAGE 1: use as CHARACTER reference. Preserve their exact facial features, bone structure, skin tone, and age. IMAGE 2: use for STYLE — painterly digital illustration. IMAGE 3: use as LIGHTING reference — soft golden hour. Output: editorial portrait of the character in a city park, mood: contemplative.»
The Keep block is critical in Modify mode. Without an explicit «Keep: face, identity, pose, lighting» the model drifts and changes things you didn't ask for. The Direct Edit template: `Edit instructions: Change: [what changes], Keep: [what stays], Style shift: [optional], Lighting: [optional], Details: [specific edits]`. This is the main weapon against drift.
Common mistakes
1. Tag soup without structure
Uni-1 is not diffusion — it expects coherent sentences or explicit sections. «cat, fluffy, garden, sunny, big eyes, flowers» works significantly worse than a coherent description or a Fast Start template with explicit sections (subject, style, lighting, camera, mood, details). Plain comma tags don't tap into Uni-1's strengths.
2. Role conflicts between references
Two IMAGEs with the same role (e.g. both assigned as CHARACTER) cause drift — the model doesn't know which is primary. Assign unique roles to each reference: one CHARACTER, another STYLE, a third LIGHTING. This reduces conflicts and produces clean fusion.
3. Missing Keep block in Modify
In Modify mode without an explicit Keep, the model may change things you didn't ask for — face, pose, lighting, surroundings. Every Modify prompt should have a «Keep: [concrete list of things to preserve]» block. For iterative edits, repeat the Keep on every turn.
4. Other models' parameter syntax
Midjourney `--ar`, Stable Diffusion `::weight`, `(keyword:1.2)`, BREAK don't work in Uni-1 and end up as literal noise in the prompt. Set dimensions explicitly («2K», «portrait», «landscape»), weight words via order and explicit reference roles, set style via normal adjectives or an explicit Style section.
5. Too many styles at once
«photorealistic + anime + watercolor» without explicit intent breaks the result. Uni-1 tries to combine incompatible directions and produces a strange hybrid. If you want a style mix — assign them to different references (IMAGE 1 STYLE: photorealism, IMAGE 2 STYLE: watercolor texture) or use one dominant style with accents.
Before / after examples
Example 1
Before
ceramic artist, beautiful, detailed studio
After
A ceramic artist shaping a lopsided bowl, documentary photography style, soft window lighting, close-up shot, cluttered home studio background, mood: focused and quiet, details: clay-covered hands, imperfect texture, tools scattered on wooden table.
The Fast Start template covers 80% of tasks. Structure: [subject] + [style] + [light] + [camera/composition] + [background] + mood + details. Each section gives the model an anchor for its reasoning pass.
Example 2
Before
portrait combining 3 references
After
Combine the following: IMAGE 1: use as CHARACTER reference. Preserve facial features, bone structure, skin tone. IMAGE 2: use for STYLE — painterly digital illustration with visible brush strokes. IMAGE 3: use as LIGHTING reference — soft golden hour side light. Output: Subject: editorial portrait of the character in a city park during autumn. Style: dominant from IMAGE 2. Composition: rule of thirds, medium close-up. Details: warm autumn palette, soft shadow on the right side.
Multi-Reference Fusion with explicit roles is Uni-1's main technique with ≥2 references. Role conflict (two IMAGEs as CHARACTER) causes drift. Unique roles plus an Output block with specifics produces a clean fusion.
Example 3
Before
remove the wall behind the subject (Modify)
After
Edit instructions: Change: replace the brick wall behind the subject with a soft out-of-focus city skyline at golden hour. Keep: subject's face, identity, pose, clothing, exact lighting on the face, framing. Style shift: documentary photography. Details: warm orange tones in background, slight bokeh, no harsh edges around subject.
The Direct Edit template with an explicit Keep is critical in Modify. Without Keep the model changes things you didn't ask for (face, pose, lighting). The Keep list locks the contract: what to touch, what to leave alone.