Unpacking the strengths and weaknesses of the oldest rivals in the large language model arena.
Let's get right into it. Starting with technical specifications - what is the context window for LLama 3? What is the maximum number of output tokens for ChatGPT 3.5? Here's the data:
The data is pretty straightforward, as LLama 3 is the more modern model. The context window is higher, and the knowledge cutoff is more recent. Yet, there are a few upsets. For instance, the number of parameters is supposedly higher, according to most sources. And the number of output tokens also is bigger for ChatGPT 3.5.
This benchmark is the combination of multiple release notes benchmarks, to get the fullest picture.
Here, the data is conclusive. LLama 3 stumps the older OpenAI model. It is a better coding AI, tutoring, chatting - you name it. How does it manifest in practical use? Let's get to the practical tests to see.
Our goal is to test AI in tricky situations, that may mimic some of the situations arising during the use. So, let's get straight to it.
We're going to try prompts for chatting, coding, and image analysis to see what abilities both models have to display.
Give me 10 sentences that end with real words ending with \"-UM\"
1. ... plum.
2. ... vacuum.
3. ... cremum.
4. ... gum.
5. ... helium.
6. ... soufflé.
7. ... Gymnocalycium.
8. ... forum.
9. ... calcium.
10. ... hum.
Note: While "cremum" is not a commonly used word,
it is a real Latin word that refers to a type of ancient Roman vessel.
"Crum" is a less common variant of the word "crumb".
"Gymnocalycium" is a real genus of cacti.
1. ... world.
2. ... mountain.
3. ... discussions.
4. ... goal.
5. ... life.
6. ... audience.
7. ... training.
8. ... from.
9. ... event.
10. ... architecture.
You have 4 marbles in a cup.
You turn the cup upside down, and put it in the freezer.
How many marbles do you have now?
Clever trick!
You still have 4 marbles,
but they're no longer in the cup
because you turned it upside down!
They're probably scattered
around on the floor or counter now!
You still have 4 marbles in the cup,
even though it is now upside down and
in the freezer
Write me code for the snake game.
Give me a score counter,
an option to restart on C and quit on Q.
Make the game look good.
Analyze the following image:
Clever trick!
You still have 4 marbles,
but they're no longer in the cup
because you turned it upside down!
They're probably scattered
around on the floor or counter now!
You still have 4 marbles in the cup,
even though it is now upside down and
in the freezer
Currently, ChatGPT 3.5 API has no computer vision capabilities. neither can this particular model generate images, unlike its more modern counterparts.
Same for LLama, no image analysis is possible.
LLama 3 beat ChatGPT 3.5 in all regards, being a more modern and well-versed model. You might be asking - what about GPT 4, and GPT 4 omni? Well, those models boast orders of magnitude more parameters, but we'll make sure to test them in the future. You can start your own tests now in our Playground.
The Pricing model is given in AI/ML API tokens. As LLama 3 is open source, setting it up and maintaining locally would have different pricing.
This is a complete victory for LLama 3 over ChatGPT 3.5. This means that for development one is substantially better than the other. Yet, has similar pricing, which means there you might prefer.
While we've highlighted some strengths, the best model for your specific task depends on your needs. The snippet below provides a starting point to test and compare two language models, Llama 3 and ChatGPT 3.5. Play around with different prompts and see which model delivers the most relevant and impressive results for you!
import openai
def main():
client = OpenAI(
api_key='<YOUR_API_KEY>',
base_url="https://api.aimlapi.com",
)
# Specify the two models you want to compare
model1 = 'meta-llama/Llama-3-70b-chat-hf'
model2 = 'gpt-3.5-turbo-16k'
selected_models = [model1, model2]
system_prompt = 'You are an AI assistant that only responds with jokes.'
user_prompt = 'Why is the sky blue?'
results = {}
for model in selected_models:
try:
response = client.chat.completions.create(
model=model,
messages=[
{'role': 'system', 'content': system_prompt},
{'role': 'user', 'content': user_prompt}
],
)
message = response.choices[0].message.content
results[model] = message
except Exception as error:
print(f"Error with model {model}:", error)
# Compare the results
print('Comparison of models:')
print(f"{model1}: {results.get(model1, 'No response')}")
print(f"{model2}: {results.get(model2, 'No response')}")
if __name__ == "__main__":
main()
LLamas' win over ChatGPT was and still remains a huge victory for open-source models and Meta. It has won in chatting and pricing, and shown good results in coding. Many apps now use LLama 3 as their main model, and it is a quick and reliable model that has earned its place in the big race.
No matter which model you choose, you are sure to be impressed by its capabilities. Pick LLama 3 if this comparison has satisfied you. Choose ChatGPT 3.5 if you still need more evidence. Or search in our Models catalog, which provides newer models like ChatGPT 4o and Claude 3 Haiku.