Create with OpenAI GPT-4o

GPT-4o offers significantly faster performance and is 50% more cost-effective than GPT-4 Turbo in the API

Cost comparison to OpenAI

Model capabilities

GPT-4o is a unified model trained end-to-end on text, vision, and audio, marking OpenAI's initial foray into exploring multimodal AI capabilities and limitations.

Speak with ChatGPT
Speak with ChatGPT
Chat about images
Chat about images
Real-time translation
Real-time translation
OpenAI Icon logo
GPT-4o capabilities

ChatGPT has voice and image capabilities, enhancing interaction through voice conversations and visual inputs. Users can now discuss photos, get travel insights, solve math problems, and plan meals visually in real time.

ChatGPT has voice and image capabilities

How to Get an OpenAI ChatGPT-4o API Key with AI/ML API

Model Scoring

GPT-4o matches the text, reasoning, and coding intelligence of GPT-4 Turbo, while establishing new benchmarks in multilingual, audio, and vision capabilities.

Enhanced Reasoning

GPT-4o achieves a record score of 88.7% on the 0-shot COT MMLU for general knowledge questions, utilizing our new Simple Evals library. Additionally, it sets a top score of 87.2% on the traditional 5-shot no-CoT MMLU.

Get API Key
Enhanced Reasoning
Audio ASR Performance

Audio ASR Performance

GPT-4o significantly enhances speech recognition capabilities beyond Whisper-v3, offering marked improvements across all languages, especially in those with fewer resources.

Get API Key

Audio Translation Performance

GPT-4o establishes a new benchmark in speech translation, surpassing Whisper-v3 and popular models from Google and Meta on MLS benchmarks.

Audio Translation Performance
Vision Understanding Evals

Vision Understanding Evals

GPT-4o reaches state-of-the-art levels in visual perception benchmarks. All vision assessments are conducted 0-shot, including MMMU, MathVista, and ChartQA, which utilize 0-shot CoT methodologies.

Get API Key

Ready to get started? Create an account today