Gemini 1.5 Pro
VS
ChatGPT-4o

Gemini has been crushing the LLM leaderboards recently.
Meanwhile, ChatGPT-4o is still the most widely accepted AI.
Is the market fair? Let's test it, with our API.

Get API Key

Benchmarks and specs

Specs

Both models are at the forefront of technology, offering powerful natural language processing capabilities that cater to various applications. However, despite their similarities, GPT-4o and Gemini 1.5 Pro exhibit distinct differences in architecture, performance, and use cases. We will be taking the latest snapshot of ChatGPT, released, on 6th of August. This article explores these differences to help you understand which model might be better suited to your needs.

Specification	Gemini 1.5 Pro	ChatGPT-4o
Сontext Window	2M	128K
Output Tokens	8K	16K
Number of parameters in the LLM	unknown	unknown
Knowledge cutoff	November 2023	October 2023
Release Date	May 24, 2024	August 6, 2024
Tokens per second	~65	~103

Although GPT-4o has an edge in terms of output speed and the ability to generate longer responses, Gemini 1.5 Pro stands out with the larger input context window, allowing it to handle more extensive and complex contexts.

Benchmarks

This benchmark is the combination of official release notes for both models, as well as multiple open benchmarks.

Benchmark	Gemini 1.5 Pro	ChatGPT-4o
Undergraduate level knowledge MMLU	85.9	88.7
Graduate level reasoning GPQA	46.2	53.6
Code Human Eval	82.6	90.2
Math problem-solving MATH	76.6	70.2
Multilingual Math MGSM	88.7	90.5
Reasoning DROP, F1	78.9	83.4

Although Gemini performs better in Maths, GPT-4o beats it in all other benchmarks. This probably has to do with the fact that Gemini has been updated multiple times since the benchmarks release. It'll be interesting to check, whether GPT also beats Gemini in coding - we'll test that...

Get API Key

Practical tests

We're thoroughly testing AI in challenging, realistic situations. We'll use prompts related to chat, coding, and image analysis to evaluate each model's performance.

Some of the outputs will be edited down for shortness or clarity.

Prompt 1. Just Chatting:

In 50-75 words, explain what an anagram is and give two examples - 
one word and one phrase.

‍

Gemini 1.5 output

GPT-4o output

An anagram is a word or phrase formed 
by rearranging the letters of another 
word or phrase. 
It is like a word puzzle where you 
shuffle the letters to create 
something new!

Word Example:
`Listen - Silent`

Phrase Example: 
`School master - The classroom`

An anagram is a rearrangement 
of the letters in a word or phrase 
to form a new word or phrase
...
For example, the word "listen" 
can be rearranged to form "silent." 

As a phrase, 
"the eyes" can be rearranged to 
"they see."

Correct Answer ✅

Correct Answer ✅
Both models provided accurate and concise responses to the prompt, successfully explaining what an anagram is and giving the required examples.

Prompt 2. Language comprehension:

Read the following sentence: 
"The complex houses married and single soldiers and their families." 

In 40-60 words, explain why this sentence is grammatically correct 
but confusing. Then, rephrase it to make its meaning clear.

Gemini 1.5 output

GPT-4o output

The sentence is grammatically correct 
because it follows subject-verb 
agreement, but it is confusing 
due to the ambiguous modifier 
"married and single soldiers." 

It sounds like the complex houses both 
groups and their families, which is 
illogical since single soldiers
would not have families in the 
traditional sense. 

Rephrased: 
"The complex houses soldiers, 
both married and single,
and provides accommodation 
for their families."

The sentence is grammatically correct, 
but its structure creates ambiguity. 
The word "complex" can be interpreted 
as either an adjective describing 
the houses or as a noun. 

Additionally, 
"married and single soldiers 
and their families" lacks clarity 
about who lives there.

Rephrased: 
"The housing complex accommodates 
soldiers, whether married or single, 
along with their families."

Correct Answer: ❌
The core reason of ambiguity is not identified, but the word limit is met and the paraphrasing is correct.

Correct Answer: ✅
Correct interpretation, adhered to word limit.

Prompt 3. Logical reasoning:

1111 = 0;
1234 = 0;
9876 = 4;
5656 = 2;
0011 = 2;
0743 = 1;
8911 = 3;
8888 = ?

Gemini 1.5 output

GPT-4o output

This is a classic pattern recognition 
puzzle. The answer lies in counting 
the number of closed circles in each 
number. 

Let us break it down:
1111 = 0: The number '1' has no 
closed circles.
…
8911 = 3: The '8' has two closed 
circles, and the '9' has one.

Following this pattern:
Since each '8' has two closed 
circles, the total is 
2 + 2 + 2 + 2 = 8.
Therefore, "8888 = 8"

This puzzle is based on counting 
the number of closed loops in each 
of the digits. Let us break it down:
- 0 has 1 closed loop.
- 1 has 0 closed loops.
…
Finally, for 8888: 
Each '8' has 2 loops. 
Total = 2 + 2 + 2 + 2 = 8.
So, the answer is "8".

Correct Answer: ✅
Correct conclusion with clear, step-by-step reasoning.

Correct Answer: ✅
Also completely correct.

Prompt 4. Math:

Every morning Aya goes for a "9"-kilometer-long walk and stops at a coffee shop 
afterwards. When she walks at a constant speed of "s" kilometers per hour, the 
walk takes her "4 hours", including "t" minutes spent in the coffee shop. 

When she walks "s+2" kilometers per hour, the walk takes her "2 hours and 
24 minutes", including "t" minutes spent in the coffee shop. 

Suppose Aya walks at "s + 1/2" kilometers per hour. 
Find the number of minutes the walk takes her, including the "t" minutes 
spent in the coffee shop.

Gemini 1.5 output

GPT-4o output

Let us break this problem down 
step-by-step:
...
We know: Distance = Speed * Time
So: 
Scenario 1:
9 = s * (240 - t) / 60
...
Scenario 2:
9 = (s + 2) * (144 - t) / 60
...
540 = 240 * s - s * (144 - 48 * s)
Simplify and solve the quadratic 
equation for s. 
"Error happens on the next line"
You will find s = 3 km/h. 
...
Answer: "154.2 minutes"

To solve this problem, we need to 
determine the walking speed of Aya, 
denoted as "s", 
and the time she spends in the 
coffee shop, denoted as "t".

Equation for the first scenario:
9/s + t/60 = 4
Equation for the second scenario:
9/(s+2) + t/60 = 2.4

Solve the system of equations:
...
s = 2.5 km/h.
t = 24 minutes

Finally, calculate the total time
when Aya walks at 
s + 0.5 = 3 km/h: 9/3 + 24/60 = 
3 + 0.4 = 3.4 hours
Thus, the walk, 
including the coffee shop time, 
takes Aya "204 minutes".

Correct Answer: ❌
Incorrect solution. Good start, followed by a false answer to a quadratic equation.

Correct Answer: ✅
Correct conclusion, without missing a beat.

Prompt 5. Let's get coding:

Objective: Create a Tetris game using Python and the Pygame library. 
The game should include classic Tetris gameplay, 
with blocks of different shapes falling from the top of the screen. 

The player can move, rotate, and place the blocks, 
aiming to complete horizontal lines to score points.

Requirements:
Game Board:
Tetrominoes:
Game Logic:
Scoring:
Game Over:
Controls:
UI Elements:

With such tasks it is paramount to have very detailed requirements, which you can write out with an AI model itself.

Gemini 1.5 output

GPT-4o output

snake game with a score counter, programmed by gpt 4o

snake game, programmed by chat sonnet 3.5 with a counter, pause and a difficulty menu

Correct Answer: ✅
Excellent result. All conditions are met.

Correct Answer: ❌
Bad result. The game instantly crashes with an error. This is a consistent outcome, with 4 attempts given to the model.

Prompt 4:

Analyze the following image:

By Lennart Sikkema - Imported from 500px (archived version) by the Archive Team. (detail page), CC BY 3.0, Link

LLama 3 70B output

ChatGPT-3.5 output

Clever trick!
You still have 4 marbles, 
but they're no longer in the cup 
because you turned it upside down! 

They're probably scattered 
around on the floor or counter now!

You still have 4 marbles in the cup, 
even though it is now upside down and
in the freezer

Correct Answer: ✅
Trick question deserves a trick answer!
Good understading of nuance.

Correct Answer: ❌
Even Zero-shot Chain of Thought couldn't save it in tests.

Conclusion

These tests are the hardest we've thrown at any model - especially the coding part. And as you can see - one of the two models is always ready to solve the tasks you provide. We'd recommend ChatGPT-4o for language comprehension, and Gemini 1.5 for coding.

Get API Key

Pricing

GPT-4o and Gemini 1.5 have equal input prices with lower output price by Gemini.

1M Tokens	Gemini 1.5 Pro	GPT-4o (6th Aug)
Input price	$2,625	$2,625
Output price	$7,875	$10,5

Get API Key

Compare for yourself

You've seen these models in action. Now it's your turn to test them for your specific needs. Copy the code below into Google Colab or your preferred coding environment, add your API key, and start experimenting!

import openai
import requests

def main():
    client = OpenAI(
      api_key='<YOUR_API_KEY>',
      base_url="https://api.aimlapi.com",
    )

    # Specify the two models you want to compare
    model1 = 'gpt-4o-2024-08-06'
    model2 = 'gemini-1.5-pro' 
    selected_models = [model1, model2]

    system_prompt = 'You are an AI assistant that only responds with jokes.'
    user_prompt = 'Why is the sky blue?'
    results= {}
    
    for model in selected_models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {'role': 'system', 'content': "be strong"},
                    {'role': 'user', 'content': "who is strong?"}
                ],
            )

            message = response.choices[0].message.content
            results[model] = message
        except Exception as error:
            print(f"Error with model {model}:", error)

    # Compare the results
    print('Comparison of models:\n')
    print(f"{model1}:\n{results.get(model1, 'No response')}")
    print('\n')
    print(f"{model2}:\n{results.get(model2, 'No response')}")

if __name__ == "__main__":
    main()

‍

Conclusion

While both models show competence in various tasks, Gemini 1.5 Pro demonstrates stronger overall performance, especially in more complex tasks like coding and problem-solving. It is also surprisingly cheaper. GPT-4o, even after price cuts for its newer model - remains a very expensive alternative to most competitors. It also fails to perform in coding, albeit its language skills are as high as they get.

You can access both Gemini 1.5 PRO and the latest snapshot of ChatGPT-4o API, or see our full model lineup here - try for yourself, and get a feel for the frontier AI power!

Get API Key

‍

Also check these other comparisons with the models:

ChatGPT-4o mini vs Llama 3.1 8B
ChatGPT-4o vs LLama 405B
ChatGPT-4o vs Sonnet 3.5
ChatGPT-4o vs Qwen2