LLama 3 70B
VS
ChatGPT 3.5

Unpacking the strengths and weaknesses of the oldest rivals in the large language model arena.

Access both with one API

Explore and use multiple AI functionalities through a single API. Ideal for experimentation and groundbreaking projects.

200+ Models
‍

3x faster response
‍

OpenAI compatible
‍

99.9% Uptime
‍

Get API Key

many bubbles with ai logos, connected as a web

Benchmarks and specs

Specs

Let's get right into it. Starting with technical specifications - what is the context window for LLama 3? What is the maximum number of output tokens for ChatGPT 3.5? Here's the data:

Specification	LLama-3 70B	ChatGPT-3.5
Input Сontext Window	8000	4096
Maximum Output Tokens	2048	4096
Knowledge Cutoff	December 2023	April 2023 (updated after launch)
Number of parameters in the LLM	70 billion	unknown (reportedly 20-175 billion)
Release Data	18 April, 2024	30 Nov, 2022

The data is pretty straightforward, as LLama 3 is the more modern model. The context window is higher, and the knowledge cutoff is more recent. Yet, there are a few upsets. For instance, the number of parameters is supposedly higher, according to most sources. And the number of output tokens also is bigger for ChatGPT 3.5.

Benchmarks

This benchmark is the combination of multiple release notes benchmarks, to get the fullest picture.

Benchmark	LLama-3 70B	ChatGPT-3.5
Undergraduate level knowledge MMLU (5-shot)	82.0	70.0
Graduate level reasoning GPQA (0-shot)	39,5	28.1
Code HumanEval (0-shot)	81.7	48.1
Grade school math GSM-8K (8-shot, CoT)	93.0	57.1
Math problem-solving MATH (4-shot, CoT)	50.4	34.1

Here, the data is conclusive. LLama 3 stumps the older OpenAI model. It is a better coding AI, tutoring, chatting - you name it. How does it manifest in practical use? Let's get to the practical tests to see.

Get API Key

Practical tests

Our goal is to test AI in tricky situations, that may mimic some of the situations arising during the use. So, let's get straight to it.

We're going to try prompts for chatting, coding, and image analysis to see what abilities both models have to display.

Prompt 1:

Give me 10 sentences that end with real words ending with \"-UM\"

‍

LLama 3 70B output

ChatGPT 3.5 output

1. ... plum.
2. ... vacuum.
3. ... cremum.
4. ... gum.
5. ... helium.
6. ... soufflé.
7. ... Gymnocalycium.
8. ... forum.
9. ... calcium.
10. ... hum.

Note: While "cremum" is not a commonly used word, 
it is a real Latin word that refers to a type of ancient Roman vessel. 
"Crum" is a less common variant of the word "crumb". 
"Gymnocalycium" is a real genus of cacti.

1. ... world.
2. ... mountain.
3. ... discussions.
4. ... goal.
5. ... life.
6. ... audience.
7. ... training.
8. ... from.
9. ... event.
10. ... architecture.

Correct Answers: 8/10 ✅
Cremum is not a ship, and soufflé does not end with "-um".

Correct Answers: 0/10 ❌

Prompt 2:

You have 4 marbles in a cup.
You turn the cup upside down, and put it in the freezer. 

How many marbles do you have now?

LLama 3 70B output

ChatGPT 3.5 output

Clever trick!
You still have 4 marbles, 
but they're no longer in the cup 
because you turned it upside down! 

They're probably scattered 
around on the floor or counter now!

You still have 4 marbles in the cup, 
even though it is now upside down and
in the freezer

Correct Answer: ✅
Trick question deserves a trick answer!
Good understanding of nuance.

Correct Answer: ❌
Even Zero-shot Chain of Thought couldn't save it in tests.

Prompt 3:

Write me code for the snake game. 
Give me a score counter, 
an option to restart on C and quit on Q.

Make the game look good.

LLama 3 70B output

ChatGPT 3.5 output

snake game with a score counter, programmed by llama 3

Correct Answer: ✅
No errors had to be fixed to start playing - the model gives it straight to you.
Some bugs are present, but this certainly is a success!

Correct Answer: ✅
This game had a higher speed of play, a minimalistic black-and-white interface, and a comparable amount of bugs across multiple game sessions. Also a success.
‍

Prompt 4:

Analyze the following image:

By Lennart Sikkema - Imported from 500px (archived version) by the Archive Team. (detail page), CC BY 3.0, Link

LLama 3 70B output

ChatGPT-3.5 output

Clever trick!
You still have 4 marbles, 
but they're no longer in the cup 
because you turned it upside down! 

They're probably scattered 
around on the floor or counter now!

You still have 4 marbles in the cup, 
even though it is now upside down and
in the freezer

Correct Answer: ✅
Trick question deserves a trick answer!
Good understading of nuance.

Correct Answer: ❌
Even Zero-shot Chain of Thought couldn't save it in tests.

What about the images?

Currently, ChatGPT 3.5 API has no computer vision capabilities. neither can this particular model generate images, unlike its more modern counterparts.

Same for LLama, no image analysis is possible.

Conclusion

LLama 3 beat ChatGPT 3.5 in all regards, being a more modern and well-versed model. You might be asking - what about GPT 4, and GPT 4 omni? Well, those models boast orders of magnitude more parameters, but we'll make sure to test them in the future. You can start your own tests now in our Playground.

Get API Key

Pricing

The Pricing model is given in AI/ML API tokens. As LLama 3 is open source, setting it up and maintaining locally would have different pricing.

1k AI/ML Tokens	LLama-3 70B	ChatGPT-3.5
Input price	$0.00117	$0.00065
Output price	$0.00117	$0.00195

This is a complete victory for LLama 3 over ChatGPT 3.5. This means that for development one is substantially better than the other. Yet, has similar pricing, which means there you might prefer.

Get API Key

Compare for yourself

While we've highlighted some strengths, the best model for your specific task depends on your needs. The snippet below provides a starting point to test and compare two language models, Llama 3 and ChatGPT 3.5. Play around with different prompts and see which model delivers the most relevant and impressive results for you!

import openai

def main():
    client = OpenAI(
      api_key='<YOUR_API_KEY>',
      base_url="https://api.aimlapi.com",
    )

    # Specify the two models you want to compare
    model1 = 'meta-llama/Llama-3-70b-chat-hf'
    model2 = 'gpt-3.5-turbo-16k'
    selected_models = [model1, model2]

    system_prompt = 'You are an AI assistant that only responds with jokes.'
    user_prompt = 'Why is the sky blue?'
    results = {}

    for model in selected_models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {'role': 'system', 'content': system_prompt},
                    {'role': 'user', 'content': user_prompt}
                ],
            )

            message = response.choices[0].message.content
            results[model] = message
        except Exception as error:
            print(f"Error with model {model}:", error)

    # Compare the results
    print('Comparison of models:')
    print(f"{model1}: {results.get(model1, 'No response')}")
    print(f"{model2}: {results.get(model2, 'No response')}")

if __name__ == "__main__":
    main()

‍

Conclusion

LLamas' win over ChatGPT was and still remains a huge victory for open-source models and Meta. It has won in chatting and pricing, and shown good results in coding. Many apps now use LLama 3 as their main model, and it is a quick and reliable model that has earned its place in the big race.

No matter which model you choose, you are sure to be impressed by its capabilities. Pick LLama 3 if this comparison has satisfied you. Choose ChatGPT 3.5 if you still need more evidence. Or search in our Models catalog, which provides newer models like ChatGPT 4o and Claude 3 Haiku.

Get API Key

LLama 3 70BVSChatGPT 3.5

Access both with one API

Benchmarks and specs

Specs

Benchmarks

Practical tests

Prompt 1:

LLama 3 70B output

ChatGPT 3.5 output

Prompt 2:

LLama 3 70B output

ChatGPT 3.5 output

Prompt 3:

LLama 3 70B output

ChatGPT 3.5 output

Prompt 4:

LLama 3 70B output

ChatGPT-3.5 output

What about the images?

Conclusion

Pricing

Compare for yourself

Conclusion

LLama 3 70B
VS
ChatGPT 3.5