How is GPT Image 1 different from GPT Image 1.5 and 2?

GPT Image 1 is the base model with good text rendering and photorealism. GPT Image 1.5 added improved photorealism, face preservation in editing, multi-image input, and an input_fidelity parameter. GPT Image 2 brings SOTA text rendering (CJK, Cyrillic, Arabic), thinking mode with web search, and up to 16 references. For most new tasks, 1.5 and 2 are the better choice.

What is the optimal prompt length?

Up to 500 words is the sweet spot. The technical limit is roughly 4000 tokens, but quality starts to drop after ~500. A too-short prompt (<5 words) yields unpredictable results — the model invents too much. Too long overloads the model and details get ignored. A dense 100-200 word prompt works best.

How do you get photorealism without the AI look?

Use photo terminology: «35mm film», «50mm lens», «shallow DOF», «natural color balance», «subtle film grain». Describe real textures — «visible pores», «weathered skin», «fabric wear». Avoid words like «polished», «staged», «beautiful lighting» — they activate studio gloss. An explicit «photorealistic» at the start helps.

Does the model support transparent background?

Yes, transparency is a built-in feature via a dedicated API/UI parameter. Ideal for stickers, icons, characters, assets. The prompt can additionally state «transparent background», but the parameter is what guarantees a clean alpha mask. Typical sticker formula: «cute cartoon knight sticker, thick lines, white outline, transparent background».

Can existing images be edited?

Yes, via the image-to-image endpoint. Pass the source plus a prompt with a change instruction. Specify WHAT to change and WHAT to preserve: «Change only the background to a beach, keep the person, pose, and lighting unchanged». Without an explicit preserve block the model may change more than needed. Especially important for iterative edits.

Why does the model refuse to generate?

OpenAI has one of the strictest moderators. It triggers not only on explicit NSFW but on combinations of innocent words in suspicious context. Real celebrities and recognizable IP faces are blocked by policy. If refused — reformulate: drop the triggering combo, swap context to editorial/fashion, use fictional characters.

Does Opten support GPT Image 1?

Yes, the Opten extension auto-detects GPT Image 1 inside ChatGPT and API platforms. It scores prompts against the structure outlined above: presence of a visual medium, specificity, camera terms, quotes for text, absence of SD syntax and quality boosters. One click delivers a rewrite in the correct structure.

Image

GPT Image 1: how to write prompts the model actually understands

Name: GPT Image 1
Brand: OpenAI

OpenAI · Updated: May 19, 2026

GPT Image 1 is an OpenAI image model with natural-language prompting and strong in-image text rendering. It runs in ChatGPT and via API, supports resolutions up to 1536×1024, transparent background, three quality tiers, and image-to-image editing. Prompts of ~500 words are optimal.

What GPT Image 1 does well

The main strengths are accurate readable in-image text (signs, menus, labels, UI mockups), high prompt adherence, photorealism through camera terms, and built-in transparent background support (ideal for stickers and assets).

In ChatGPT the model uses multi-turn context — images can be refined iteratively in a single conversation. In the API every request is autonomous. Image-to-image editing is supported via a dedicated endpoint.

Resolutions 1024×1024, 1536×1024, 1024×1536
Formats PNG, JPEG, WebP, dedicated transparency parameter
Quality high / medium / low
Image-to-image editing via API
Prompt length up to ~4000 tokens, optimal up to 500 words

Prompt structure

Layered formula: [Visual medium/Style] + [Subject] + [Environment/Scene] + [Lighting/Mood] + [Composition/Angle] + [Details and textures] + [Constraints/Exclusions].

The model understands natural language — no tags or special syntax. Describe like a story, but with concrete visual details.

Specificity is the main rule. «A foggy mountain valley at dawn, golden light filtering through pine trees, reflected in a mirror-still lake» works tenfold better than «a beautiful landscape». Minimum 2-3 descriptive details per scene: color, texture, material, shape.

Camera and photorealism

Camera terms work significantly better than generic «8K, ultra-detailed».

Shot size: close-up, medium shot, wide angle, aerial view. Lenses: 50mm, 35mm, macro, fisheye. Focus: shallow depth of field, bokeh, sharp focus throughout. Angle: low angle, bird's eye view, eye level, Dutch angle.

For lighting avoid generic «good lighting». Use specifics: «dramatic side lighting creating strong shadows», «soft box lighting eliminating harsh shadows», «golden hour», «fluorescent overhead», «neon glow», «candlelight». The more precise the light, the more precise the mood.

In-image text and iterative work

GPT Image 1 is top-tier for in-image text. Exact text always in quotes or CAPS: `"OPEN 24/7"`, `"CAFE LUNA"`. Specify font style («elegant handwriting», «bold sans-serif», «neon sign lettering»), size, color, placement. For complex words (brands, rare spellings) spell letter by letter: `C-A-F-E L-U-N-A`.

In ChatGPT use an iterative approach. Start with a base prompt, then refine in small steps: «Same scene, but make the lighting warmer», «Add a person sitting on the bench on the left», «Remove the tree in the background». A series of precise edits beats one overloaded prompt.

Common mistakes

1. Stable Diffusion syntax
Weights like `(word:1.5)`, `(masterpiece:1.3)`, comma-separated tags `1girl, masterpiece, best quality`, embeddings, LoRA references — GPT Image 1 works with natural language, not tags. These constructions land in the prompt as literal noise or degrade output.
2. Quality boosters «8K, ultra HD, masterpiece»
Generic quality praise barely affects GPT Image 1. Concrete camera terms («85mm at f/1.8», «shallow DOF», «golden hour»), style references, and lighting descriptions work many times better than any quality stack.
3. Missing environment
«A red sports car» versus «a red sports car on an empty desert highway with mountains on the horizon» — dramatically different results. Without context the model decides on its own, and output is unpredictable. Even minimal background description significantly improves the frame.
4. Conflicting styles in one prompt
«Photorealistic cartoon», «minimalist detailed», «realistic stylized» — conflict without explanation of how styles should combine. The model can't decide what to prioritize. If a stylistic blend is needed, describe it explicitly: «realistic photography with subtle painterly post-processing».
5. Negatives without a positive alternative
«Don't draw background», «no people, no text, no clutter» are less effective than positive descriptions. «Transparent background» beats «no background». «Clean composition» beats «no clutter». Describe what you want, not what you don't.

Before / after examples

Example 1

Before

beautiful portrait

After

Editorial portrait of a woman in her thirties with freckles and short auburn hair, wearing a cream-colored cashmere sweater. Soft natural light from a north-facing window, calm contemplative expression, shallow depth of field. Shot on 85mm lens at f/1.8, subtle film grain, muted warm palette, fashion editorial style.

Concrete subject, appearance details, specific lighting, camera terms, style reference. «Beautiful» is an empty word.

Example 2

Before

café sign on an old brick wall

After

A weathered metal café sign mounted on a red brick wall in a 1920s Brooklyn neighborhood. The sign reads "BREW & BEAN" in bold cream-colored sans-serif lettering with a small coffee cup icon. Warm afternoon light catches the metal, soft shadows on the brick. Documentary photography, shallow depth of field, muted warm palette.

Exact text in quotes, specific font and color, era, surface material, lighting type. Without this the model invents all details.

Example 3

Before

(masterpiece:1.5), (best quality:1.3), 1girl, blue dress, beautiful, garden, photorealistic, 8k

After

A young woman in her twenties wearing a flowing pale blue linen dress, walking through a sunlit cottage garden in early summer. Soft natural light, golden hour warmth, shallow depth of field. Shot on 85mm lens at f/1.8, candid documentary style, subtle film grain.

Parenthetical weights `(word:1.5)` and comma-separated tags are Stable Diffusion syntax. GPT Image 1 doesn't support them. A coherent description with camera terms hits the target.

GPT Image 1: how to write prompts the model actually understands

What GPT Image 1 does well

Prompt structure

Camera and photorealism

In-image text and iterative work

Common mistakes

1. Stable Diffusion syntax

2. Quality boosters «8K, ultra HD, masterpiece»

3. Missing environment

4. Conflicting styles in one prompt

5. Negatives without a positive alternative

Before / after examples

Frequently asked

Related models

Z-Image (Base / Turbo)

Wan (General — 2.5 / 2.6)

Seedream 5 Lite

Ready to write GPT Image 1 prompts in one click?