GPT Image 1.5: how to write prompts the model actually understands
OpenAI · Updated:
GPT Image 1.5 is OpenAI's image model with improved photorealism, identity preservation during editing, and multi-image input. It supports resolutions up to 1536×1024, transparent background, three quality tiers, an input_fidelity parameter (high/low), and up to 4 images per request. Optimal prompt length is up to 500 words.
What's new in GPT Image 1.5
Version 1.5 brought ten concrete upgrades: improved photorealism with natural lighting and accurate materials, a flexible quality/speed balance (low quality already beats GPT Image 1's visual quality), face and identity preservation during editing, reliable text rendering, support for complex structured visuals (infographics, diagrams), and precise style control via minimal prompting.
Additional gains: strong real-world knowledge, improved composition preservation during edits, more accurate lighting, and higher detail on fine elements.
- input_fidelity parameter (high/low) for edit control
- Multi-image input — up to 4 images per request
- Face and identity preservation during editing
- Background: transparent / opaque / auto
- Prompt up to ~4000 tokens, optimal up to 500 words
Prompt structure
OpenAI's recommended order: [Background/Scene] → [Subject] → [Key details] → [Constraints/Exclusions]. This differs from GPT Image 1, where the subject came first.
Also include the use case — «Product shot for an e-commerce listing», «Infographic for a student audience», «UI mockup showing a mobile app screen». This sets the mode and polish level.
For complex requests use short bulleted segments or line breaks instead of one long paragraph. A layered structure (subject, environment, lighting, style, technical parameters) yields clean and predictable output.
Multi-image input and editing
Multi-image is one of 1.5's key features. Reference each image by index: «Image 1: product photo with the watch on a white surface. Image 2: style reference, dark moody studio lighting. Apply Image 2's style to Image 1». For compositing: «put the bird from Image 1 on the elephant in Image 2».
For editing use the edit endpoint with input_fidelity. High fidelity preserves composition and identity (use for face-preserving edits); low allows creative freedom (style transfer, reimagining). State explicitly: «Change only X» + «keep everything else the same». On iterations repeat the preserve list — otherwise the model drifts.
Text and structured visuals
Exact text in quotes or CAPS: `"SUMMER SALE 50% OFF"`. Specify typography: font style, size, color, placement. For brands and rare words — letter by letter: `S-T-A-R-B-U-C-K-S`. For infographics with lots of text — `quality="high"`.
GPT Image 1.5 is especially strong on structured visuals: infographics, diagrams, multi-panel compositions, explanatory illustrations. Specify audience («for students», «for executives») and type («timeline», «labeled diagram», «funnel chart») — the model picks detail level and text density accordingly.
Common mistakes
1. Ignoring API parameters
`quality`, `background`, `input_fidelity`, and `num_images` affect output as much as the prompt text. Requesting a high-quality infographic with small text at `quality="medium"` guarantees blurry labels. Requesting a sticker without `background: transparent` gives a white background.
2. Stable Diffusion syntax
Weights like `(word:1.5)`, comma-separated tags `1girl, masterpiece, best quality`, embeddings, LoRA references — GPT Image 1.5 works with natural language, not tags. These constructions are ignored or degrade output. Write coherent sentences.
3. Overloading iterations
«Change hair, background, clothing, add glasses, make it cinematic» — the model tries to do everything at once and loses identity. Change one element at a time, repeating the preserve list at each step. GPT Image 1.5 is especially good at iterative work precisely because of face preservation.
4. Missing use case
«Make an infographic» — the model doesn't know the polish level or density. «Educational infographic for students explaining...» or «Pitch-deck slide for executives showing...» sets the mode. Use case influences style, font size, illustration density as much as the main subject.
5. Quality boosters «8K, ultra HD, masterpiece»
Generic quality praise is nearly useless. Concrete terms (lens, lighting direction, depth of field) work significantly better. Plus API parameters (`quality="high"`) give real control over final sharpness, unlike words in the prompt.
Before / after examples
Example 1
Before
beautiful product photo
After
Product shot for an e-commerce listing. A premium minimalist wireless headphone, matte black with brushed steel accents, placed on a minimalist white surface. Soft gradient lighting from the upper left, soft shadows beneath, slight reflection on the smooth surface. Professional studio photography, sharp focus, neutral cool color balance, quality="high".
Use case stated («e-commerce listing»), bg → subject → details order, concrete lighting and surface, explicit `quality="high"`.
Example 2
Before
infographic about the water cycle
After
Educational infographic for students explaining the water cycle. Clean white background with five labeled stages: "Evaporation", "Condensation", "Precipitation", "Collection", "Transpiration". Use bold sans-serif font for stage labels, soft blue color palette for water, warm yellow for sun. Connecting arrows between stages. Top title (centered): "The Water Cycle". quality="high".
Audience («for students»), exact text labels in quotes, concrete typography and palette, `quality="high"` for small text.
Example 3
Before
Change her hair color and the background and add glasses and make it cinematic
After
Image 1: portrait photo. Change only the hair color to deep auburn. Keep the same facial features, expression, pose, glasses or lack thereof, and clothing unchanged. Maintain identical lighting and background. input_fidelity="high".
Multiple edits in one prompt confuse the model. One precise edit with an explicit preserve list and `input_fidelity="high"` preserves identity.