This is a comparison of 2 flagship models, so any upper hand in raw specifications is an impressive feat. Let's get to it - context window, knowledge cutoff, and tokens/second characteristics of both models.
Sonnet by Anthropic gets the lead early on with a larger context window and knowledge cutoff in 2024 - something never seen before. They really do take their training data seriously. ChatGPT 4o keeps its impressive generation speed record, though.
This benchmark is the combination of official release notes for both models (GPT 4 omni and Claude Sonnet 3.5), as well as multiple open benchmarks.
Some benchmarks like GPQA and GSM8K show incredible performance increases. If testing supports this Claude is about to become the go-to model.
We're thoroughly testing AI in challenging, realistic situations. We'll use prompts related to chat, coding, and image analysis to evaluate each model's performance.
Some of the outputs will be edited down for shortness.
Write a short story using only words that have fewer than 5 letters.
Tiny Tim woke up. Dark sky.
He felt cold. Wind blew.
Trees bent.
Tim ran fast.
Home was far. He saw a hut.
Thin man sat. Big hat on.
Tim said, "Help me!"
...
Tim ran.
Home at last. Mom had joy.
Dad gave hug. All was good.
The sun rose over the hill.
A boy and his dog went for a
walk in the park. The air was
cool and the sky was blue.
"Come on, Max!" said the boy
to his pet.
Max ran fast, his tail up high.
He saw a cat and gave a bark.
The cat fled up a tree in fear.
"No, Max! Be nice," the boy said.
...
And so the boy and his dog went
back to their cozy home, full of
joy from a nice day out.
Alice has 2 sisters and 3 brothers.
How many sisters does Alice's brother have?
Alice's brother shares the same
siblings as Alice.
Since Alice has 2 sisters and
3 brothers, each of her brothers
also has the same number of siblings.
##Answer
Therefore, Alice's brother has
2 sisters.
Let's think through this step-by-step:
1. We know that Alice has:
- 2 sisters
- 3 brothers
2. Now, we need to consider
the question from the perspective
of Alice's brother.
3. Alice's brother would have
the same sisters as Alice,
plus Alice herself.
4. So, Alice's brother has:
- Alice's 2 sisters
- Plus Alice herself
5. Therefore, Alice's brother
has 3 sisters in total.
##Answer
Alice's brother has 3 sisters.
A game consists of black and white pieces. The number of black pieces is
5 more than 3 times the white pieces. Seven white and 15 black pieces
are removed each round. After several rounds, there are 3 white and 56
black pieces left. How many pieces were there in the beginning?
This is a task from the 7th grade International Junior Math Olympiad.
Both answers are edited down to fit more testing. If you are interested in the explanation - definitely ask Sonnet 3.5 for it.
Answer:
242 pieces.
There were
213 pieces in total at the beginning of the game.
On the island of Nevermind, some people are liars who always lie.
The remaining habitants of the island are truthlovers
who tell only the truth.
Three habitants of the island, A, B, and C met this morning.
A said: “All of us are liars”.
B said: “Only one of us is a truthlover”.
Who is the truthlover?
0001 if the answer is A;
0002 if the answer is B;
0003 if the answer is C;
0004 if none of them
Another task from IJMO.
Given this reasoning, the truthlover must be B.
Answer: 0002
C is the truthlover.
The answer is 0003.
Write me code for the snake game.
Give me a score counter,
an option to restart on C and quit on Q.
Add the menu to choose difficulty.
The player should have the chance to pause on P.
## To celebrate this new model - we used tried a new test.
## Building a Pacman game. So far the models hvae struggled,
## so we prompted them 8 times to debug and finish the product.
Give me code for a Pacman game.
It should include a ghost, a labyrinth, and food.
Add pixelated special effects when eating.
Analyze the following image:
Clever trick!
You still have 4 marbles,
but they're no longer in the cup
because you turned it upside down!
They're probably scattered
around on the floor or counter now!
You still have 4 marbles in the cup,
even though it is now upside down and
in the freezer
This time, both models have vision capabilities. Performance is complex and to be tested separately.
Anthropic keeps making strides with its safety research - Sonnet 3.5 is your go-to if you want to push for a change in the state of current AI development.
ChatGPT 4o gives around 100 tokens/s, whilst Sonnet provides 80 - both are impressive feats. Moreover, GPT 4o still has an upper hand in the multimodal aspect with its speech recognition.
Our testing showed that Sonnet is stronger in maths - most likely due to the low sample size. The rest of the tasks show comparable results.
If Sonnet 3.5 can come out on top in more complicated tasks, most likely Claude 3.5 will be an absolute beast. With recent advances of Gemini 1.5 to the top of the LLM arenas - we have an incredibly competitive landscape on our hands.
The Pricing model is given in AI/ML API tokens. GPT 4o and Sonnet 3.5 have the same output prices with Sonnet having the cheaper input price.
You've seen what these models can do - now try them for your use case. Plug the code below into Google Colab or any IDE, use your API Key, and get testing!
import openai
import requests
def main():
client = OpenAI(
api_key='<YOUR_API_KEY>',
base_url="https://api.aimlapi.com",
)
# Specify the two models you want to compare
model1 = 'gpt-4o'
model2 = 'claude-3-5-sonnet-20240620import openai
import requests
def main():
my_api_key = '9e37cc3d7e1e4cdb9e0236dfd73f3a74';
client = OpenAI(
api_key=my_api_key,
base_url="https://api.aimlapi.com",
)
# Specify the two models you want to compare
results= {}
##gpt-4o request
try:
response = client.chat.completions.create(
model='gpt-4o',
messages=[
{'role': 'system', 'content': "be strong"},
{'role': 'user', 'content': "who is strong?"}
],
)
message = response.choices[0].message.content
results['gpt-4o'] = message
except Exception as error:
print(f"Error with model gpt-4o:", error)
##Sonnet request
url = "https://api.aimlapi.com/messages"
headers = {
"Authorization": f"Bearer {my_api_key}",
"Content-Type": "application/json"
}
payload = {
"model": "claude-3-5-sonnet-20240620",
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "How are you?"
}
],
}
response = requests.post(url, json=payload, headers=headers)
message = response.json()['content'][0]['text']
results['claude-3-5-sonnet-20240620'] = message
# Compare the results
print('Comparison of models:\n')
print(f"gpt-4o:\n{results.get('gpt-4o', 'No response')}")
print('\n')
print(f"claude-3-5-sonnet-20240620:\n{results.get('claude-3-5-sonnet-20240620', 'No response')}")
if __name__ == "__main__":
main()
Claude Sonnet 3.5 is a worthy competitor, proudly taking its place amongst the best models on the market. The tests are stacked against GPT 4o, and even the pricing is very welcoming. Pick Sonnet 3.5 if our tests have convinced you - or stick to the old reliable ChatGPT 4o.
You can check our model lineup here - try any of them for yourself with our API Key.