Llama 3.1 8B
VS
ChatGPT-4o mini

Small but proud: Llama 3.1 8B and GPT-4o mini - which one will prove more effective?

Benchmarks and specs

Specs

We're analysing two top-tier lightweight AI models, where any edge in basic specs affects performance. The key aspects will be covered: size of the context window, knowledge cutoff, and the tokens per second for both AI models.

Specification	Llama 3.1 8B	ChatGPT-4o mini
Сontext Window	128K	128K
Output Tokens	4K	16K
Number of parameters in the LLM	> 8B	unknown
Knowledge cutoff	December 2023	October 2023
Release Date	July 23, 2024	July 18, 2024
Tokens per second	~147	~99

Although GPT-4o mini is superior in terms of maximum output length, Llama 3.1 8b seems to have advantages in processing speed and more relevant knowledge. The choice between the two will depend on specific usage requirements.

Benchmarks

This benchmark is the combination of official release notes for both models (Llama 3.1 8b and GPT 4o mini), as well as multiple open benchmarks.

Benchmark	Llama 3.1 8B	ChatGPT-4o mini
Undergraduate level knowledge MMLU	73.0	82.0
Graduate level reasoning GPQA	32.8	40.2
Code Human Eval	72.6	87.2
Math problem-solving MATH	51.9	70.2
Multilingual Math MGSM	68.9	87.0
Reasoning DROP, F1	59.5	79.7

GPT-4o mini outperforms Llama 3.1 8b in most categories, showing particular strengths in advanced reasoning, code generation, and complex problem-solving. It excels in undergraduate and graduate-level knowledge (MMLU and GPQA), coding tasks (Human Eval), advanced mathematics (MATH), multilingual math problems (MGSM), and reasoning (DROP, F1). Overall, GPT-4o mini demonstrates superior performance across a wider range of tests, suggesting it may be more versatile for diverse applications.

Get API Key

Practical tests

We're thoroughly testing AI in challenging, realistic situations. We'll use prompts related to chat, coding, and image analysis to evaluate each model's performance.

Some of the outputs will be edited down for shortness.

Prompt 1. Just Chatting:

In 50-75 words, explain what a PALINDROME is and give two examples:
one word and one phrase.

‍

Llama 3.1 8B output

GPT-4o mini output

A palindrome is a word, phrase,
or sequence that reads the same
backward as it does forward. 
...
A classic example of a word palindrome is 
'madam'. 
A phrase palindrome is 
'A man, a plan, a canal, Panama!'. 

Both exhibit this unique property of symmetry.

A palindrome is a word, phrase, 
number, or sequence that reads 
the same backward as forward. 
...
A palindromic word is "radar".
A classic palindromic phrase is 
"A man, a plan, a canal, Panama!" 

Both examples showcase the playful 
nature of language and the intrigue 
of patterns within it.

Correct Answers: ✅
A classic phrase choice, and a proper explanation.

Correct Answers: ✅
Another accurate and concise response.

Prompt 2. Logical reasoning:

Read the following sentence: "The old man the boat." 

In 40-60 words, explain why this sentence is grammatically correct but confusing. 
Then, rephrase it to make its meaning clear.

Llama 3.1 8B output

GPT-4o mini output

...
Here is a revised version of the sentence
to clarify its meaning:

"The old man owned the boat."

This rephrased sentence makes it clear that "the old man" is the owner of the boat, and not the person operating it. Alternatively, if the intended meaning is that the old man is operating the boat, the sentence could be rephrased as:

"The old man was operating the boat."

This revised sentence makes it clear that the old man is the person controlling the boat, rather than its owner.

...
"man" serves as a verb meaning 
"to crew or operate." 
However, the confusion arises from 
the unconventional use of "old" as a 
noun rather than an adjective. 

A clearer rephrasing would be: 
"Old people crew the boat."

Correct Answer: ❌
Misinterpreted the sentence structure, exceeded word limit.

Correct Answer: ✅
Correct interpretation, adhered to word limit.

Prompt 3. International olympiad:

In 50-70 words, solve this logic puzzle: "If all Zorks are Yorks, and some
Yorks are Sporks, can we conclude that some Zorks are definitely Sporks?" 

Explain your reasoning step by step.

Llama 3.1 8B output

GPT-4o mini output

Response:
 A delightful logic puzzle! 
Let's break it down:

1. All Zorks are Yorks 
(a subset relationship).

2. Some Yorks are Sporks
 (not all Yorks are Sporks, 
just a subset of them are).

3. Since all Zorks are Yorks, 
and some Yorks are Sporks, 
it means some Zorks must also 
be Sporks (by transitive reasoning).

#answer
Therefore, yes, we can conclude 
that some Zorks are definitely Sporks.

No, we cannot conclude that some Zorks are 
definitely Sporks. Here's the reasoning: 

1. All Zorks are included in the category 
of Yorks, meaning Zorks are a subset of Yorks.

2. Some Yorks are Sporks, indicating there's 
an overlap between the two categories, but not 
all Yorks are Sporks.

3. Since we only know that some Yorks are Sporks, 
we cannot definitively assert that any Zorks 
(which are a subset of Yorks) are Sporks. 

#answer
Therefore, the conclusion does not hold.

Correct Answer: ❌
Model provides consistenly incorrect solution with an incorrect answer.

Correct Answer: ✅
Correct conclusion with clear, step-by-step reasoning. Minor word limit overage doesn't negate the correct solution.

Prompt 4:

In a convex quadrilateral ABCD, the diagonals AC and BD intersect at P. 
Suppose that in triangle APC there exists a point K such that: 
PK bisects angle APС AK bisects angle PAC K lies inside ABCD.

Prove that ABCD is cyclic (can be inscribed in a circle).

Llama 3.1 8B output

GPT-4o mini output

**Step 1: Establish the given
conditions**
...

**Step 2: Analyze the angle
relationships**
...

**Step 3: Apply the Angle Bisector
Theorem**
...

**Step 4: Relate the angle bisector
to the quadrilateral**
...

**Step 5: Use the cyclic quadrilateral
theorem**
...

**Conclusion**

We have reached a contradiction, 
which means that our initial assumption 
that ABCD is not cyclic must be false. 

Therefore, we conclude that ABCD
is indeed cyclic.

**Step 1: Understand the implications of 
the angle bisector conditions.**
...

**Step 2: Relate the angles in triangle 
\(APC\).**
...

**Step 3: Examine the angles around 
point \(P\).**
...

**Step 4: Use the cyclic quadrilateral 
property.**

\angle APB + \angle CPD = 180
which shows that opposite angles sum to 180.

**Conclusion:**
...
We have shown that \(ABCD\) can be 
inscribed in a circle, completing the proof.

Correct Answer: ✅
On the 5th step, the solution starts getting a little convoluted, yet still remains true and reaches the expected result.

Correct Answer:✅
The model provides a complete and logically structured proof.

Prompt 5. Let's get coding:

Create a simple, colorful Arkanoid game using Python and the Pygame library. 
The game should be implemented in a single module with the following features:

1. A colorful graphical interface with a resolution of 800x600 pixels. 
2. A paddle that the player can move left and right using arrow keys. 
...
10. Implement a simple "Game Over" screen

Provide the complete, well-commented code ...

Llama 3.1 8B output

GPT-4o mini output

snake game with a score counter, programmed by gpt 4o

snake game, programmed by chat sonnet 3.5 with a counter, pause and a difficulty menu

Correct Answer: ❌
An overall failure, albeit some design features are there.

Correct Answer: ✅
The game runs as intended, meeting all specified requirements

Prompt 4:

Analyze the following image:

By Lennart Sikkema - Imported from 500px (archived version) by the Archive Team. (detail page), CC BY 3.0, Link

LLama 3 70B output

ChatGPT-3.5 output

Clever trick!
You still have 4 marbles, 
but they're no longer in the cup 
because you turned it upside down! 

They're probably scattered 
around on the floor or counter now!

You still have 4 marbles in the cup, 
even though it is now upside down and
in the freezer

Correct Answer: ✅
Trick question deserves a trick answer!
Good understading of nuance.

Correct Answer: ❌
Even Zero-shot Chain of Thought couldn't save it in tests.

Conclusion

Based on the provided benchmarks and practical tests, GPT-4o mini demonstrates superior performance across a wide range of tasks compared to Llama 3.1 8B. While Llama 3.1 8B has advantages in processing speed and a slightly more recent knowledge cutoff, GPT-4o mini outperforms it mostly.

Get API Key

Pricing

The Pricing model is given in AI/ML API tokens. GPT-4o mini and Llama 3.1 8B have similar input prices with 4x lower output price by Llama.

1k AI/ML Tokens	Llama 3.1 8B	GPT-4o mini
Input price	$0.000234	$0.000195
Output price	$0.000234	$0.0009

Get API Key

Compare for yourself

You've seen these models in action. Now it's your turn to test them for your specific needs. Copy the code below into Google Colab or your preferred coding environment, add your API key, and start experimenting!

import openai
import requests

def main():
    client = OpenAI(
      api_key='<YOUR_API_KEY>',
      base_url="https://api.aimlapi.com",
    )

    # Specify the two models you want to compare
    model1 = 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo'
    model2 = 'gpt-4o-mini' 
    selected_models = [model1, model2]

    system_prompt = 'You are an AI assistant that only responds with jokes.'
    user_prompt = 'Why is the sky blue?'
    results= {}
    
    for model in selected_models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {'role': 'system', 'content': "be strong"},
                    {'role': 'user', 'content': "who is strong?"}
                ],
            )

            message = response.choices[0].message.content
            results[model] = message
        except Exception as error:
            print(f"Error with model {model}:", error)

    # Compare the results
    print('Comparison of models:\n')
    print(f"{model1}:\n{results.get(model1, 'No response')}")
    print('\n')
    print(f"{model2}:\n{results.get(model2, 'No response')}")

if __name__ == "__main__":
    main()

‍

Conclusion

Overall, GPT-4o mini appears to be the more capable and versatile model, particularly for tasks requiring advanced reasoning, complex problem-solving, detailed explanations, and practical coding implementations. Its superior performance in the Arkanoid game coding task demonstrates its ability to translate complex requirements into functional code, a crucial skill for real-world applications. Still Llama 3.1 8B can help cut costs on the output, which is significant for chatting implementations.

You can check our model lineup here - try any of them for yourself with our API Key.

Get API Key