How to write prompts for GPT Image 2: 5 steps from random output to precise result
Vlad Voronezhtsev · · 8 min read

GPT Image 2 is OpenAI's "thinking" image model. It processes the prompt sequentially: what comes first gets the most visual weight. Unlike Midjourney, which is happy with a tag soup, and Nano Banana, which defaults to a bright bubblegum exposure, GPT Image 2 expects a structured brief with declared purpose and a calm, neutral palette. Write Midjourney-style prompts for it and half your credits go to noise. These 5 steps turn random output into predictable results for everything from ad billboards to dense-text infographic slides.
- 1.
Structure beats tag soup
GPT Image 2 reads the prompt top-to-bottom and assigns the largest weight to the first paragraph. Bury the main subject at the end and the model won't surface it — your shot ends up being about something else. The working order: [Background/Scene] → [Subject] → [Key details] → [Style/Medium] → [Lighting/Composition] → [Text in quotes] → [Constraints: what to keep, what to avoid]. The block format itself can be anything — natural language, JSON-ish structure, an instruction list — all work. What matters: intent and constraints must live in the first 30-40 words. Stable-Diffusion-style tag soup (`girl, redhair, summer, masterpiece, 8k, octane render`) doesn't work for GPT Image 2: the model tries to use the tags but has no hierarchy, so output is random.
Before
summer, girl, red hair, beach, golden hour, cinematic, 35mm, photorealistic, masterpiece
After
Candid photograph: a young woman with red hair walking along an empty beach at golden hour. Subject centered, looking away from camera. Photorealistic, 35mm film, shallow depth of field, warm natural light, subtle film grain.
- 2.
Write a brief, not a description
Top lifehack: declare the purpose. Not "a nice product image" but "premium campaign image for streetwear brand Thread." Not "a UI screen" but "iPhone mockup for the onboarding flow of a fintech app." The purpose triggers the right template stack in the model: ads imply tight composition and tagline space; pitch-deck slides imply a grid and readable numbers; product shots imply a neutral backdrop and precise material lighting. With no declared purpose, the model guesses — differently each time. This is the single most common reason the same prompt gives three different outputs in a row. Bonus: state the audience or use context ("for an investor deck", "for teen-audience social media") — the model adapts tone visually.
Before
beautiful advertising image of a new smartphone
After
Premium product campaign image for "Aurora" smartphone (mid-range, target audience: 25-35 urban professionals). Hero shot on a neutral grey gradient background, soft three-point studio lighting, phone tilted 15° to show edge profile, subtle shadow. Tagline area on left third (reserve empty space). Render once, integrated lifestyle cue: faint coffee cup blur in background.
- 3.
Exact text always in quotes
GPT Image 2 is SOTA at rendering text inside images — its main win over Midjourney and Stable Diffusion. But if you don't wrap exact text in quotes, the model treats words as scene description and routinely warps letters, adds extra characters, or drops case. The rule: anything that must appear literally goes inside `"..."` or ALL CAPS. Specify the typeface (`bold sans-serif, Inter`), the size (`large headline`), the color and placement (`centered top third`). For rare words, brand names, or non-English spellings — spell them in brackets. For dense or small text (chart legends, fine-print) always set `quality="high"` — on `medium` and `low` micro-text comes back with artifacts. Multilingual support: text can be in Cyrillic, Chinese, Japanese, Korean, Hindi, Bengali, Arabic — all render cleanly.
Before
billboard with text Fresh and Clean about a cleaning product, modern design
After
Outdoor billboard for a cleaning product brand. Billboard text (EXACT, verbatim, no extra characters, no logo drift): "Fresh and Clean". Typography: bold sans-serif, Inter, white on deep teal background, centered, large size. Below the tagline (smaller, 30% of headline size): "Available nationwide". Quality: high.
- 4.
Change / Preserve / Constraints template for editing
When you need to change one thing and keep everything else — without an explicit preserve block the model drifts: it shifts the face along with the outfit, the lighting along with the background, the camera angle along with the weather. Surgical edits template: `Change: [what changes]` / `Preserve: [face, pose, lighting, angle, background, geometry, text, layout]` / `Constraints: [no extra objects, no redesign, no logo drift, no watermark]`. The advantage: the template explicitly blocks drift. Especially critical for virtual try-on (swapping clothing on a person), interior swaps (one piece of furniture for another), weather/season changes. Repeat the preserve list on every iteration — otherwise by the 3rd or 4th pass the model forgets the original identity constraints and gradually "redraws" the character.
Before
make her hair red
After
Change: hair color from brown to natural red (auburn). Preserve: face, facial features, skin tone, eye color, expression, pose, lighting direction, background, clothing, all other identity markers. Constraints: no extra objects, no redesign of any element except hair, no watermark, no logo drift.
- 5.
Iterate, don't overload
It's tempting to cram every requirement into one prompt: style, lighting, text, constraints, aspect ratio, identity preservation. Don't — the model can't hold 15 orthogonal requirements simultaneously, and one of them collapses (usually text or identity). The right workflow: clean base prompt → evaluate the output → targeted single-axis edit. Examples of one-shot edits: `make lighting warmer`, `remove the extra tree on the left`, `replace the typography with Inter bold`, `restore the original background`. This is far faster than rewriting from scratch. Use `quality="high"` only when you actually need it (dense text, close-up portraits, identity-sensitive editing) — `medium` works for 80% of jobs and is 2-3× faster. Last note: GPT Image 2 does NOT understand Midjourney syntax (`--ar 16:9`, `::`, `(keyword:1.2)`) — specify aspect ratio as explicit pixel size, weight things in natural language ("emphasize the cat", "de-emphasize the background").