This is the comparison of two frontier models, performing at the top level in the technical benchmarks. o1-mini is specifically made to succeed at STEM, and based purely off numbers - it stands in the class of its own, outclassng the rest in this regard. However, GPT-4o and OpenAI's o1-mini need to be tested neck-to-neck to really see the difference. This comparison explores this battle, looking in-detail at the models performances, to see which one better suits our needs.
The key differences between o1-mini and GPT-4o lie in their maximum output tokens and processing speed. o1-mini supports a significantly higher output of 65k tokens, while GPT-4o is limited to 16k tokens. However, GPT-4o excels in speed, generating 103 tokens per second compared to o1-mini’s 73.9. Both models share the same input context window, knowledge cutoff, and similar parameter sets. Reportedly, the newer model has much better performance in Arabic and Korean then its predecessor. The real distinction will come from testing these models to see how they perform in practice
This benchmark is the combination of official release notes for both models, as well as multiple open benchmarks.
GPT-4o holds its ground in undergraduate reasoning, but o1-mini captures the rest of the spectrum, succeeding in coding, maths and other technical tasks.
We are carefully examining AI in difficult, real-world scenarios. We will use questions about chatting, coding, and picture analysis to assess each model's ability.
Some of the outputs will be edited down for shortness or clarity.
Alice has N sisters and M brothers.
How many sisters does Andrew, the brother of Alice have?
Alice has N sisters and M brothers.
Since Andrew is one of Alice's
brothers, he would have the same
number of sisters and brothers
as Alice,
except he would count himself
among the brothers.
#Answer
So, Andrew has N sisters.
To determine how many sisters Andrew
has, let's break down the information
provided:
...
- The total number of sisters
in the family is **N + 1**
(Alice plus her N sisters).
...
##Answer
So, Andrew has N + 1 sisters.
How many 'r's are there in the word 'strawberry'?
#Answer
The word "strawberry" contains 2 'r's
Assistant: The word "strawberry"
contains three letter 'r's.
Here's the breakdown:
- S
- T
- R (1st 'r')
- A
- W
- B
- E
- R (2nd 'r')
- R (3rd 'r')
- Y
#Answer
So, there are 3 'r's in "strawberry."
Alice and Bob play the following game.
A stack of n tokens lies before them.
The players take turns with Alice going first.
On each turn, the player removes either 1 token or 4 tokens from the stack.
Whoever removes the last token wins.
Find the number of positive integers n <= 2024 for which
there exists a strategy for Bob that guarantees
that Bob will win the game regardless of Alice's play.
A position is a winning position
for a player if they can force a win
no matter what the opponent does.
We will analyze a few cases to derive
a pattern.
We start by analyzing small values of n
to identify losing positions for Alice.
...
n = 1 + 4k.
Now, 1, 5, 9, 13, ... ≤ 2024
define these losing positions.
Solve
4k + 1 ≤ 2024 ⇒ 4k ≤ 2023 ⇒ k ≤ 505
Therefore, the sequence is k = 0 to 505
which represents 506 valid terms and thus,
such values of n.
'Answer'
506
To determine the number of positive
integers n we analyze the game using
combinatorial game theory.
...
'Winning and Losing Positions:'
- Winning Position (N-position):
A position from which the player
can force a win.
- Losing Position (P-position):
A position from which every move
leads to a winning position
for the opponent.
'Identifying P-positions:'
- By analyzing small values of n,
we observe a pattern
where the P-positions satisfy
n ≡ 0 or n ≡ 2 (mod 5).
Counting P-positions up to 2024
Multiples of 5 (n ≡ 0 (mod 5)): 404.
Numbers congruent to 2 modulo 5
(n ≡ 2 (mod 5)): 405.
'Answer'
405 + 409 = 809
Objective: Create a Tetris game using Python and the Pygame library.
The game should include classic Tetris gameplay,
with blocks of different shapes falling from the top of the screen.
The player can move, rotate, and place the blocks,
aiming to complete horizontal lines to score points.
Requirements:
Game Board:
Tetrominoes:
Game Logic:
Scoring:
Game Over:
Controls:
UI Elements:
With such tasks it is paramount to have very detailed requirements, which you can write out with an AI model itself.
Write a slider for images (image1.png, image2.png, image3.png)
that the user can control using only HTML and CSS
Analyze the following image:
Clever trick!
You still have 4 marbles,
but they're no longer in the cup
because you turned it upside down!
They're probably scattered
around on the floor or counter now!
You still have 4 marbles in the cup,
even though it is now upside down and
in the freezer
These tests are the hardest we've thrown at any model - especially the coding part. And as you can see - one of the two models is always ready to solve the tasks you provide. We'd recommend ChatGPT-4o for more straight-forward tasks, and o1-mini for convoluted STEM stuff, or tasks requiring high output.
This time o1-mini is more expensive, which isn't easy to achieve when comparing with gpt-4o. Input prices are similar, with output being higher by around 20% for o1-mini.
You've seen these models in action. Now it's your turn to test them for your specific needs. Copy the code below into Google Colab or your preferred coding environment, add your API key, and start experimenting!
import openai
import requests
def main():
client = OpenAI(
api_key=aiml_api_key,
base_url="https://api.aimlapi.com",
)
# Specify the two models you want to compare
model1 = 'gpt-4o-2024-08-06'
model2 = 'o1-mini'
selected_models = [model1, model2]
user_prompt = 'Why is the sky blue?'
results= {}
for model in selected_models:
try:
response = client.chat.completions.create(
model=model,
messages=[
{'role': 'user', 'content': "who is strong?"}
],
max_tokens=2000,
)
print(response)
message = response.choices[0].message.content
results[model] = message
except Exception as error:
print(f"Error with model {model}:", error)
# Compare the results
print('Comparison of models:\n')
print(f"{model1}:\n{results.get(model1, 'No response')}")
print('\n')
print(f"{model2}:\n{results.get(model2, 'No response')}")
if __name__ == "__main__":
main()
O1-mini proves to be the more capable model for tasks requiring complex reasoning, math problem-solving, and precise coding. It consistently performs better across benchmarks and practical tests. However, GPT-4o shows strength in coding tasks with less complex designs, excelling in tasks like HTML/CSS sliders and simpler coding scenarios. The choice between these models depends on the nature of the task: o1-mini for advanced logic and problem-solving, and GPT-4o for general knowledge and simpler coding tasks.
You can access both o1-mini and the latest snapshot of ChatGPT-4o API, or see our full model lineup here - try for yourself, and get a feel for the frontier AI power!
Also check other comparisons with these models:
ChatGPT-4o vs LLama 405B
ChatGPT-4o vs Sonnet 3.5
ChatGPT-4o vs Qwen2
ChatGPT-4o mini vs LLama 8B