GPT Image 2: how to write prompts the model actually understands
OpenAI · Updated:
GPT Image 2 is OpenAI's image model with SOTA in-image text rendering and a thinking mode. It treats prompts as design briefs, processes tokens sequentially (first words carry the most weight), and supports up to 16 reference images plus 8 linked outputs per request. English works best, but multilingual support is solid.
What GPT Image 2 does well
The breakthrough feature is accurate, readable in-image text: ad taglines, infographics, UI mockups, QR codes, multilingual typography (Cyrillic, CJK, Arabic). The photorealism in GPT Image 2 has a neutral exposure — no characteristic AI gloss — which gives it an edge in moody, overcast, and desaturated genres.
The model behaves like a thinking system: on complex prompts it automatically switches to thinking mode, can reason, use web search, and self-check its output. For simple tasks, Instant mode kicks in — fast generation without deliberation.
- Exact text in quotes, multilingual typography
- Photorealism without AI gloss (neutral exposure)
- Up to 16 reference images + up to 8 linked frames per request
- Surgical edits via Change / Preserve / Constraints
- Knowledge cutoff December 2025 + web search in thinking mode
Prompt structure
Optimal order: [Background/Scene] + [Subject] + [Key details] + [Style/Medium] + [Lighting/Composition] + [Text in quotes] + [Constraints].
The main rule — the primary subject always goes first. The model processes tokens sequentially, and words in the opening lines get maximum visual weight. Bury the topic at the end of a paragraph and it loses priority.
Write the prompt as a designer brief, not as a tag list. State the purpose (ad, UI mockup, infographic, product shot) — this activates the right model mode. Format is flexible: natural language, JSON-like structure, instruction-style directives all work.
Edit template: Change / Preserve / Constraints
For surgical edits, GPT Image 2 supports a fixed three-part template — change one thing while keeping everything else intact:
Change: [exactly what changes] Preserve: [face, identity, pose, lighting, framing, background, geometry, text, layout] Constraints: [no extra objects, no redesign, no logo drift, no watermark]
For iterative editing, repeat the preserve list on every turn — otherwise the model drifts and starts changing things you didn't ask for. This is especially critical for virtual try-on, interior object swaps, and multi-reference composites.
In-image text
GPT Image 2 is the best-in-class model for rendering text inside frames. Rules:
Exact text always in quotes or ALL CAPS — "Billboard text (EXACT, verbatim): 'Fresh and clean'". For tricky words (brands, rare spellings) spell them out letter by letter. Specify font, size, color, placement.
For dense text, infographics, and small type, set `quality="high"` — `medium`/`low` will break micro-text. Works with Latin, Cyrillic, CJK, Hindi, Bengali, Arabic. Long text without quotes can be mangled or padded with stray characters — a known weakness.
Common mistakes
1. Primary subject buried at the end of the prompt
The model processes tokens sequentially — first words carry maximum weight, last ones barely influence composition. If your topic is in the third sentence, the camera angle and scene will dominate. Move the main subject to the first sentence.
2. Long text without quotes
Ask for «the words Fresh and clean on a label» and the model will often mangle the letters or add extras. Exact text always in quotes or ALL CAPS, with «EXACT» or «verbatim»: «label text (EXACT): "Fresh and clean"». Critical for branding.
3. Edit prompt without a preserve block
«Change the background» without an explicit «preserve: face, identity, pose» changes face features, pose, or lighting in 7 out of 10 cases. Every edit prompt should end with a structured preserve list. For iterative editing, repeat it on every turn.
4. Studio-gloss vocabulary for photorealism
Words like «polished», «staged», «beautiful lighting», «professional shoot» trigger the characteristic AI gloss. For candid photorealism you need the opposite: «35mm film», «natural light», «visible pores», «weathered texture», «subtle film grain». GPT Image 2 is especially strong at moody genres — don't kill that with studio language.
5. Copying Midjourney or Stable Diffusion syntax
Parameters like `--ar 16:9`, `::weight`, `(keyword:1.2)` don't work in GPT Image 2 and end up as literal noise in the prompt. Set dimensions explicitly ("1024×1536", "portrait"), weight words by order (important first), styles via normal adjectives.
Before / after examples
Example 1
Before
beautiful clothing brand ad banner with young people
After
Premium campaign image for youth streetwear brand Thread. Group of friends hanging out on a Brooklyn rooftop at golden hour, street fashion photography cues, clean composition, strong color direction, natural poses. Tagline (exact, in white sans-serif at bottom center): "Yours to Create". photorealistic, 35mm film, shallow DOF, natural color balance. quality="high".
Key change: a designer brief instead of a description. Purpose, concrete scene details, exact text in quotes, photography vocabulary, quality parameter.
Example 2
Before
replace the chairs with wooden ones
After
In this room photo, Change: replace ONLY the white chairs with chairs made of natural oak wood with visible grain. Preserve: camera angle, room lighting, floor shadows, table position, wall colors, and all surrounding objects. Constraints: no extra furniture, no redesign of the room, no watermark.
An edit prompt without a preserve block is almost always interpreted as a redesign — the model changes chairs plus lighting, angle, and surrounding objects. An explicit preserve list locks the contract.
Example 3
Before
infographic about a sales funnel
After
Pitch-deck slide titled "Sales Funnel Q4 2026". Show a 5-stage funnel: "Leads (12,400)", "Qualified (3,200)", "Demo (980)", "Proposal (310)", "Closed Won (87)". Use Inter bold sans-serif for stage labels, brand color #9CFB51 for highlights on Closed Won, white background, clean grid alignment. Bottom-right corner: brand logo placeholder labeled "OPTEN". quality="high".
Numbers in the prompt + explicit font + color palette + layout = the model assembles a nearly production-ready slide. Without font and `quality="high"`, small labels blur.