gpt-image-1

GPT-Image-1 turns plain-language prompts into crisp, safe images and quick on-canvas edits.

GPT-Image-1 Description

OpenAI GPT-Image-1 is a natively multimodal generative transformer built for high-fidelity text-to-image creation and editing. The model extends a GPT-4-class decoder with specialized visual token embeddings and cross-modal attention, letting it follow intricate design instructions, leverage world knowledge, and accurately render on-image text.

‍

‍Technical Specification

Performance Benchmarks

OpenAI Image 1 is optimized for high-fidelity image generation and visual content creation.

Architecture – GPT-4-derived decoder with vision adapters and an extra masked-editing head for in-painting.
Native output sizes – 1 024 × 1 024 px square; widescreen and portrait variants at 1 024 × 1 536 px or 1 536 × 1 024 px, with 4-K upscaling on demand.

API Pricing:
- Text Tokens Input: $5.25
- Image Tokens Input: $10.5
- Low Quality Price per Image Generation
  - 1024x1024: $0.0116
  - 1024x1536: $0.017
  - 1536x1024: $0.017
- Medium Quality Price per Image Generation
  - 1024x1024: $0.044
  - 1024x1536: $0.066
  - 1536x1024: $0.066
- High Quality Price Per Image Generation
  - 1024x1024: $0.175
  - 1024x1536: $0.263
  - 1536x1024: $0.263

‍

Performance Metrics

GIE-Bench (2025): In a 1 000-task grounded image-editing benchmark, GPT-Image-1 achieved the highest functional-correctness scores among all tested models while maintaining strong content preservation.

STRICT text-rendering stress-test: GPT-Image-1 (marketed inside ChatGPT as “GPT-4o images”) is one of only two proprietary models that keep low error rates on multi-line text up to ≈800 characters, far ahead of open-source diffusers.

Enterprise roll-outs: Early partners such as Adobe Firefly, Figma Design, Canva and Wix report “double-digit prompt-to-asset speed-ups” after switching to GPT-Image-1.

‍

Key Capabilities

OpenAI Image 1 delivers precise visual outputs for complex creative workflows.

Multi-style generation: Photorealism, illustration, anime, vector, 3-D and data-viz all from the same endpoint.
Accurate typography: Posters, UI mocks and multi-line labels render clean, legible text even in small fonts.
World-knowledge synthesis: Leverages the GPT-4o family’s language grounding to place branded items, real people or factual diagrams correctly.
Enterprise-grade safety: Provenance watermarking, tunable moderation and no training on customer data, aligning with legal and brand-safety needs.

Example of a generated image with high quality parameters, created with the prompt: “Generate an anime image of a hedgehog holding a paper that says Try GPT-Image-1 today with AI/ML API."

gpt-image-1 example — gpt-image-1 Example

‍

Optimal Use Cases

Creative & Marketing: Social ads, hero shots, product lifestyle renders.
Design Prototyping: Rapid concept art, theme exploration, on-canvas edits inside Figma or Adobe.
E-commerce: Background removal, colorway variants, staged scenes for catalogs.
Education & Publishing: Diagrams, flash-cards, worksheet graphics with embedded text.
Game / Film Pre-production: Storyboards, environment studies, quick asset variations.
Enterprise Reporting: Auto-generated infographics and data-visuals directly from analytical text.

‍

Code Samples

Text-to-Image Code Sample

‍

Text-to-Image Parameters

prompt [str]: The text prompt describing the content, style, or composition of the image to be generated.
n [1-10]: The number of images to generate
output_compression [int]: The compression level (0-100%) for the generated images.
size [ 1024x1024, 1024x1536, 1536x1024]: The size of the generated image.
background [ transparent, opaque, auto ]: Allows to set transparency for the background of the generated image(s). When auto is used, the model will automatically determine the best background for the image. If transparent, the output format needs to support transparency, so it should be set to either png (default value) or webp.
moderation [ low, auto ]: Control the content-moderation level for images
output_format [ png, jpeg, webp ]: The format of the generated image.
quality [ low, medium, high ]: The quality of the image that will be generated.
response_format [ url, b64_json ]: The format in which the generated images are returned.

‍

Image Editing Code Samples

‍

Image Editing Parameters

prompt [str]: The text prompt describing the content, style, or composition of the image to be generated.
image [file | list of files]: The image(s) to edit. Must be a supported image file or an array of images. Each image should be a png, webp, or jpg file less than 50MB. You can provide up to 16 images.
mask: [file]: An additional image whose fully transparent areas (e.g. where alpha is zero) indicate where image should be edited. If there are multiple images provided, the mask will be applied on the first image. Must be a valid PNG file, less than 4MB, and have the same dimensions as image.
n [1-10]: The number of images to generate
output_compression [int]: The compression level (0-100%) for the generated images.
size [ 1024x1024, 1024x1536, 1536x1024 ]: The size of the generated image.
background [ transparent, opaque, auto ]: Allows to set transparency for the background of the generated image(s). When auto is used, the model will automatically determine the best background for the image. If transparent, the output format needs to support transparency, so it should be set to either png (default value) or webp.
moderation [ low, auto ]: Control the content-moderation level for images
output_format [ png, jpeg, webp ]: The format of the generated image.
quality [ low, medium, high ]: The quality of the image that will be generated.
response_format [ url, b64_json ]: The format in which the generated images are returned.

‍

Comparison with Other Models

Versus DALL·E 3: Sharper typography and higher prompt adherence; DALL·E 3 remains slightly faster for single-shot 512 px drafts.
Versus Stable Diffusion XL 1.0: major gains in instruction following and text rendering; SDXL remains fully open-source for local or offline deployment.
Versus Midjourney v7: deterministic seeds and built-in guardrails give GPT-Image-1 an edge for production pipelines; Midjourney still offers a broader community style palette.

‍

API Integration

Accessible via AI/ML API. Documentation: available here.

Try it now

The Best Growth Choice
for Enterprise

Get API Key

gpt-image-1

AI Playground

Our Clients' Voices

gpt-image-1

GPT-Image-1 Description