Image

GPT Image: how to write prompts the model actually understands

OpenAI · Updated:

GPT Image is OpenAI's image model family (1, 1.5, 2). It understands natural language, treats prompts as stories with visual specifics, supports 1024×1024, 1536×1024, and 1024×1536 resolutions, transparent background, and three quality tiers. The standout feature is rendering readable in-image text.

What GPT Image does well

The family's main strength is accurate in-image text: signs, menus, labels, UI mockups, posters. The model handles font, size, color, placement, and multilingual typography.

GPT Image works with natural language, not tags. It supports transparent background (a dedicated parameter), three quality tiers (high/medium/low), and a wide stylistic range from photorealism to watercolor and concept art. OpenAI's content policy is one of the strictest — NSFW, real celebrities, and violence are blocked.

  • Resolutions 1024×1024, 1536×1024, 1024×1536
  • Output formats PNG, JPEG, WebP
  • Transparency via dedicated parameter
  • Three quality tiers: high / medium / low
  • Top-tier in-image text rendering

Prompt structure

General formula: [Visual medium] + [Subject] + [Environment/Scene] + [Lighting/Mood] + [Composition] + [Details] + [Constraints].

The core principle: describe like a story, but with visual specificity. «A foggy mountain valley at dawn, golden light filtering through pine trees, reflected in a mirror-still lake» beats «a beautiful landscape» tenfold.

Start with the visual medium: «photograph», «watercolor painting», «3D render», «technical illustration», «vintage poster». This sets the generation mode for the model.

Camera and lighting for photorealism

Camera terms work significantly better than generic quality phrases like «8K, ultra HD».

Lenses: 35mm, 50mm, 85mm, macro. Depth: shallow depth of field, bokeh, sharp focus. Angle: low angle, bird's eye view, eye level, Dutch angle. Shot type: candid, portrait, product shot, aerial.

For lighting avoid generic «good lighting». Use specifics: «dramatic side lighting creating strong shadows», «soft box lighting eliminating harsh shadows», «golden hour», «fluorescent overhead», «neon glow», «candlelight». The more precise the light, the more precise the mood and atmosphere on screen.

In-image text

GPT Image is one of the best models for in-image text. Rules:

Exact text always in quotes: `"CAFE LUNA"`, `"OPEN 24/7"`. Specify font style: «elegant handwriting», «bold sans-serif», «neon sign lettering». Placement: «centered at the top», «on the wooden sign above the door». For complex or rare words spell them letter by letter: `C-A-F-E L-U-N-A`.

For dense text (menus, infographics) set `quality="high"`. At low/medium small type can break. Specify typeface, size, color — the model uses these for rendering.

Common mistakes

  1. 1. Abstract adjectives only

    «Beautiful, amazing, stunning, gorgeous» give the model no visual information — no color, texture, material, or shape. Replace with specifics: «weathered brick wall, warm afternoon light, shallow depth of field». Minimum 2-3 descriptive details per scene.

  2. 2. Stable Diffusion syntax

    Weights like `(word:1.5)`, comma-separated tags, `1girl, masterpiece, best quality`, embeddings, LoRA references — GPT Image works with natural language, not tags. These constructions are ignored or degrade output. Write sentences.

  3. 3. Quality boosters «8K, ultra HD, masterpiece»

    Generic quality praise barely affects GPT Image. Concrete camera terms («85mm, shallow DOF, golden hour»), style references («editorial photography», «watercolor illustration»), and lighting descriptions work many times better than any quality stack.

  4. 4. Missing visual medium

    Without saying whether it's a photograph, illustration, or 3D, the decision is left to the model — output becomes unpredictable. Start the prompt with a medium: «photograph», «watercolor painting», «3D render», «technical illustration», «vintage poster», «sticker design». This sets the generation mode.

  5. 5. Conflicting styles in one prompt

    «Photorealistic cartoon», «minimalist detailed», «realistic stylized» — conflict without explanation. The model can't reconcile mutually exclusive instructions. If a stylistic blend is needed, describe it explicitly: «realistic rendering with subtle anime-inspired proportions».

Before / after examples

Example 1

Before

beautiful ginger cat

After

A close-up portrait of a ginger tabby cat sitting on an old wooden windowsill, warm afternoon light filtering through lace curtains. Soft autumn garden visible through the window in soft bokeh. Shot on 50mm lens, shallow depth of field, photorealistic, muted warm palette.

Key change: visual specificity instead of a generic adjective. Concrete environment, camera terms, lighting, medium.

Example 2

Before

café with a menu

After

A chalkboard café menu mounted on an exposed brick wall, listing "Espresso $3", "Flat White $4.50", and "Lavender Latte $5" in elegant white chalk handwriting. Warm pendant lighting from above, shallow depth of field, blurred coffee shop interior in the background. Editorial café photography, quality="high".

Exact text in quotes, specific font, placement, lighting. `quality="high"` for clean small text — mandatory.

Example 3

Before

masterpiece, best quality, 8K, ultra HD, hyper-realistic, 1girl, beautiful, dress, garden

After

A young woman in her twenties wearing a flowing pale yellow linen dress, walking through a sunlit cottage garden in early summer. Soft natural light, golden hour warmth, shallow depth of field. Shot on 85mm lens at f/1.8, candid documentary style, subtle film grain, muted earthy palette.

Stable Diffusion style (comma-separated tags, quality boosters, `1girl`) is ignored or handled poorly by GPT Image. A coherent description with camera terms hits the target.

Frequently asked

How do GPT Image versions (1, 1.5, 2) differ?
GPT Image 1 is the base model with good text rendering and photorealism. GPT Image 1.5 brings improved photorealism, face preservation in editing, more reliable text, multi-image input, and an input_fidelity parameter. GPT Image 2 adds SOTA text rendering (CJK, Cyrillic, Arabic), thinking mode with web search, photorealism without AI gloss, and up to 16 references. For production work, version 2 is a clear upgrade.
How do you get photorealism without the AI look?
Use photo terminology: «35mm film», «50mm lens», «shallow DOF», «natural color balance», «subtle film grain». Describe real textures — «visible pores», «weathered skin», «fabric wear». Avoid words like «polished», «staged», «beautiful lighting» — they activate studio gloss. An explicit «photorealistic» at the start helps.
What language should prompts be written in?
English gives the most stable results — the models are trained predominantly on English. But GPT Image is multilingual and understands natural language in Russian, Chinese, Korean. For production prompts, English is recommended; for experiments and personal tasks Russian works. In-image text can be requested in any language.
How do you make a transparent background?
Use the explicit transparency parameter in the API or UI — a dedicated «background: transparent» flag or equivalent in the chosen platform. The prompt can additionally state «transparent background», but the parameter is what guarantees a clean alpha mask. Ideal for stickers, icons, and asset work without a background.
When should quality="high" be used?
For dense text, small infographic labels, close-up portraits, identity-sensitive editing, and any scene where fine detail matters (skin texture, font, fine pattern). `medium` is the default for most tasks — the speed difference is noticeable. `low` is for previews, mass generation, and A/B tests.
Why does GPT Image refuse to generate?
OpenAI has one of the strictest moderators. It triggers not only on explicit NSFW, but on combinations of innocent words in suspicious context. Real celebrities and recognizable IP faces are blocked by policy. If you get a refusal — reformulate: drop the triggering combo, swap context to editorial/fashion, don't try euphemisms to trick the filter (it's semantic, not keyword-based).
Does Opten support GPT Image?
Yes, the Opten extension auto-detects all GPT Image versions (1, 1.5, 2) inside ChatGPT and supported platforms. It scores prompts against the structure outlined above: presence of a visual medium, specificity, camera terms for photorealism, quotes for text, absence of SD syntax. One click delivers a rewrite in the correct structure.

Related models

Ready to write GPT Image (General) prompts in one click?

  • Auto-detects the model inside its native interface
  • Scores every line of your prompt
  • One-click rewrite into the correct structure
ChromeYandex BrowserChrome / Yandex BrowserInstall extension

Pro — $2.99/month or ₽199/month · cancel anytime

Stop Guessing. Generate
On The First Try.

Install Opten in 30 seconds and score your next prompt.

Opten is a Chrome extension that scores AI prompts for the specific model. Supports 60+ image and video models — Midjourney, GPT Image 2, Kling, Sora, Nano Banana, Flux — and rewrites them in one click inside the Syntx, Higgsfield, and Freepik interfaces. From $2.99/month.

© 2026 Opten · IE Nikolai Shupletsov · Tax ID 306389672