Image

GPT Image 2: how to write prompts the model actually understands

OpenAI · Updated:

GPT Image 2 is OpenAI's image model with SOTA in-image text rendering and a thinking mode. It treats prompts as design briefs, processes tokens sequentially (first words carry the most weight), and supports up to 16 reference images plus 8 linked outputs per request. English works best, but multilingual support is solid.

What GPT Image 2 does well

The breakthrough feature is accurate, readable in-image text: ad taglines, infographics, UI mockups, QR codes, multilingual typography (Cyrillic, CJK, Arabic). The photorealism in GPT Image 2 has a neutral exposure — no characteristic AI gloss — which gives it an edge in moody, overcast, and desaturated genres.

The model behaves like a thinking system: on complex prompts it automatically switches to thinking mode, can reason, use web search, and self-check its output. For simple tasks, Instant mode kicks in — fast generation without deliberation.

  • Exact text in quotes, multilingual typography
  • Photorealism without AI gloss (neutral exposure)
  • Up to 16 reference images + up to 8 linked frames per request
  • Surgical edits via Change / Preserve / Constraints
  • Knowledge cutoff December 2025 + web search in thinking mode

Prompt structure

Optimal order: [Background/Scene] + [Subject] + [Key details] + [Style/Medium] + [Lighting/Composition] + [Text in quotes] + [Constraints].

The main rule — the primary subject always goes first. The model processes tokens sequentially, and words in the opening lines get maximum visual weight. Bury the topic at the end of a paragraph and it loses priority.

Write the prompt as a designer brief, not as a tag list. State the purpose (ad, UI mockup, infographic, product shot) — this activates the right model mode. Format is flexible: natural language, JSON-like structure, instruction-style directives all work.

Edit template: Change / Preserve / Constraints

For surgical edits, GPT Image 2 supports a fixed three-part template — change one thing while keeping everything else intact:

Change: [exactly what changes] Preserve: [face, identity, pose, lighting, framing, background, geometry, text, layout] Constraints: [no extra objects, no redesign, no logo drift, no watermark]

For iterative editing, repeat the preserve list on every turn — otherwise the model drifts and starts changing things you didn't ask for. This is especially critical for virtual try-on, interior object swaps, and multi-reference composites.

In-image text

GPT Image 2 is the best-in-class model for rendering text inside frames. Rules:

Exact text always in quotes or ALL CAPS — "Billboard text (EXACT, verbatim): 'Fresh and clean'". For tricky words (brands, rare spellings) spell them out letter by letter. Specify font, size, color, placement.

For dense text, infographics, and small type, set `quality="high"` — `medium`/`low` will break micro-text. Works with Latin, Cyrillic, CJK, Hindi, Bengali, Arabic. Long text without quotes can be mangled or padded with stray characters — a known weakness.

Common mistakes

  1. 1. Primary subject buried at the end of the prompt

    The model processes tokens sequentially — first words carry maximum weight, last ones barely influence composition. If your topic is in the third sentence, the camera angle and scene will dominate. Move the main subject to the first sentence.

  2. 2. Long text without quotes

    Ask for «the words Fresh and clean on a label» and the model will often mangle the letters or add extras. Exact text always in quotes or ALL CAPS, with «EXACT» or «verbatim»: «label text (EXACT): "Fresh and clean"». Critical for branding.

  3. 3. Edit prompt without a preserve block

    «Change the background» without an explicit «preserve: face, identity, pose» changes face features, pose, or lighting in 7 out of 10 cases. Every edit prompt should end with a structured preserve list. For iterative editing, repeat it on every turn.

  4. 4. Studio-gloss vocabulary for photorealism

    Words like «polished», «staged», «beautiful lighting», «professional shoot» trigger the characteristic AI gloss. For candid photorealism you need the opposite: «35mm film», «natural light», «visible pores», «weathered texture», «subtle film grain». GPT Image 2 is especially strong at moody genres — don't kill that with studio language.

  5. 5. Copying Midjourney or Stable Diffusion syntax

    Parameters like `--ar 16:9`, `::weight`, `(keyword:1.2)` don't work in GPT Image 2 and end up as literal noise in the prompt. Set dimensions explicitly ("1024×1536", "portrait"), weight words by order (important first), styles via normal adjectives.

Before / after examples

Example 1

Before

beautiful clothing brand ad banner with young people

After

Premium campaign image for youth streetwear brand Thread. Group of friends hanging out on a Brooklyn rooftop at golden hour, street fashion photography cues, clean composition, strong color direction, natural poses. Tagline (exact, in white sans-serif at bottom center): "Yours to Create". photorealistic, 35mm film, shallow DOF, natural color balance. quality="high".

Key change: a designer brief instead of a description. Purpose, concrete scene details, exact text in quotes, photography vocabulary, quality parameter.

Example 2

Before

replace the chairs with wooden ones

After

In this room photo, Change: replace ONLY the white chairs with chairs made of natural oak wood with visible grain. Preserve: camera angle, room lighting, floor shadows, table position, wall colors, and all surrounding objects. Constraints: no extra furniture, no redesign of the room, no watermark.

An edit prompt without a preserve block is almost always interpreted as a redesign — the model changes chairs plus lighting, angle, and surrounding objects. An explicit preserve list locks the contract.

Example 3

Before

infographic about a sales funnel

After

Pitch-deck slide titled "Sales Funnel Q4 2026". Show a 5-stage funnel: "Leads (12,400)", "Qualified (3,200)", "Demo (980)", "Proposal (310)", "Closed Won (87)". Use Inter bold sans-serif for stage labels, brand color #9CFB51 for highlights on Closed Won, white background, clean grid alignment. Bottom-right corner: brand logo placeholder labeled "OPTEN". quality="high".

Numbers in the prompt + explicit font + color palette + layout = the model assembles a nearly production-ready slide. Without font and `quality="high"`, small labels blur.

Frequently asked

How is GPT Image 2 different from GPT Image 1?
GPT Image 2 brings four major upgrades: SOTA in-image text rendering (including CJK, Cyrillic, Arabic), thinking mode with web search and self-checking, photorealism without AI gloss and with neutral exposure, and support for up to 16 references with character consistency. For most tasks this is a straight upgrade, except when you need maximum speed on simple prompts.
When should I use quality="high" vs "medium"?
Use `high` for dense text, small infographic labels, close-up portraits, identity-sensitive editing, and any scene where fine detail matters (skin texture, font, fine pattern). `medium` is the default for most tasks — the speed difference is noticeable. `low` is for previews, mass generation, and A/B tests.
Can I write prompts in languages other than English?
Yes, GPT Image 2 is multilingual — you can write prompts in Russian, Chinese, Korean, and other languages. But English yields slightly more stable results in complex scenes. For production prompts we recommend English; for experiments and personal tasks Russian works fine. In-image text can be requested in any language.
How do I get photorealism without the AI look?
Use photography vocabulary: «35mm film», «50mm lens», «shallow DOF», «natural color balance», «subtle film grain». Describe real textures — «visible pores», «weathered skin», «fabric wear». Avoid words like «polished», «staged», «beautiful lighting» — they activate studio gloss. An explicit «photorealistic» at the start helps.
What is thinking mode and do I need to turn it on manually?
Thinking mode is the regime where the model «thinks» before generating: parses the prompt, can use web search, checks its variants. It activates automatically on complex prompts — no manual switch needed. On simple tasks Instant mode kicks in (fast, no reasoning). Generation in thinking mode takes longer, but first-pass quality is higher.
Why does GPT Image 2 refuse to generate?
The model has one of the strictest moderators. It triggers not only on explicit NSFW, but on combinations of innocent words: «real person + young + bathroom + suggestive» almost guarantees a refusal. Real celebrities and recognizable IP faces are blocked by OpenAI policy. If you get refused — reformulate: drop the triggering combo, swap context to editorial/fashion, don't try to trick the filter with euphemisms (it's semantic, not keyword-based).
Does Opten support GPT Image 2?
Yes, the Opten extension auto-detects GPT Image 2 inside ChatGPT and scores prompts against the structure outlined above: it checks for the main subject up front, exact text in quotes, an edit template when editing, photography vocabulary for photorealism. One click gives you a rewrite in the correct structure.

Related models

Ready to write GPT Image 2 prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672