Qwen 2 72B
VS
LLama 3 70B

Exploring the data behind a new contender for the open-source crown - Qwen 2, rivaling LLama 3

AI Playground

Access both with one API

Explore and use multiple AI functionalities through a single API. Ideal for experimentation and groundbreaking projects.

200+ Models

3x faster response

OpenAI compatible

99.9% Uptime

Get API Key
many bubbles with ai logos, connected as a web

Benchmarks and specs

Specs

Diving straight in - what is the context window size for LLama 3? Additionally, what are the release dates for both models? Here is the relevant information:

Specification LLama 3 70B Qwen 2 72B Instruct
Сontext Window 8000> 128K
Knowledge cutoff December 2023 2023 (month unspecified)
Number of parameters in the LLM 70B 72B
Release Data 18 April, 2024 7 June, 2024

The more modern Qwen 2 is boasting a much higher context window, especially in its Instruct variation.  The knowledge cutoff is unspecified and hard to deduce to the model's strong filter.

Benchmarks

This benchmark is the combination of official release notes for both models.

Benchmark Qwen 2 72B Instruct Llama 3 70B Chat
Undergraduate level knowledge
MMLU
82.3 82.0
MMLU-Pro 64.4 56.2
Graduate level reasoning
GPQA
42.4 41.9
TheoremQA 44.4 42.5
Code
Human Eval
86.0 81.7
MBPP 80.2 82.3
MultiPL-E 69.2 63.4
LiveCodeBench 35.7 29.3
Grade school math
GSM8K
91.1 93.0
Math problem-solving
MATH
59.7 50.4
MT-Bench 9.12 8.95
MixEval 86.7 84.0
Arena-Hard 48.1 41.1
IFEval 77.6 77.3
Math problem-solving
AlignBench
8.27 7.42

Here, the data is conclusive. LLama 3 stumps the older OpenAI model. It is a better coding AI, tutoring, chatting - you name it. How does it manifest in practical use? Let's get to the practical tests to see.

Get API Key

Practical tests

Our objective is to evaluate AI performance in complex scenarios that could resemble real-world usage situations. We will experiment with prompts related to conversation, programming, and image interpretation to assess the capabilities demonstrated by both models.

Prompt 1:

Write 10 sentences that end with real words ending in "-TION".

LLama 3 70B output

Qwen 2 output

1. ... promotion
2. ... education
3. ... citation
4. ... action
5. ... visualization
6. ... explanation
7. ... acceleration
8. ... sensation
9. ... emotion
10. ... conservation
1. ... integration
2. ... completion
3. ... operation
4. ... adaptation
5. ... production
6. ... information
7. ... competition
8. ... refinement
9. ... organization
10. ... excellence
Correct Answers: 10/10 ✅
LLama 3 gave the answer 3 times as fast (2s vs 6s), and generally performed better in straight language tasks.
Correct Answers: 8/10 ❌

Prompt 2:

You have a piggy bank with 10 coins. 
You add one more, 
then break the piggy bank. 

How many coins are in the piggy bank now?  Be short in your answer.

LLama 3 70B output

Qwen2 output

Clever question!

There are 0 coins in the piggy bank now, 
because you broke it!
There are 11 coins in the piggy bank now. 
However, since you've broken the piggy bank, the coins are no longer inside it. 
They are now outside the piggy bank.
Correct Answer: ✅ 
A trick question deserves some appreciation! 
Correct answer and straight to the point.
Correct Answer: ✅ 
The models comes around to the right answer eventually.

Prompt 3:

List 10 idiomatic expressions from different cultures (specify the culture for each) 
that have a similar meaning to "break the ice".
Provide a brief explanation for each.

LLama 3 70B output

Qwen2 output

1. "Tomber dans les pommes" (French): Literally means "to fall into the apples," but idiomatically means to break the ice and create a relaxed atmosphere. This phrase is often used in social gatherings to describe someone who tells a joke or does something silly to lighten the mood.
2. "Desbloquear el hielo" (Spanish): A direct translation of "break the ice," this phrase is commonly used in Latin American countries to describe the act of initiating conversation or creating a relaxed atmosphere in a social setting.
3. "Schlagen die Eis" (German): Similar to the English phrase, this German idiom means to break the ice and create a more relaxed atmosphere. It's often used in business settings to describe the act of building rapport with colleagues or clients.
4. "Kai qiao" () (Mandarin Chinese): This phrase literally means "to open the bridge," but idiomatically means to break the ice and create a connection with someone. It's often used in social and business settings to describe the act of building relationships.
5. "Faire son beurre" (French Canadian): This phrase literally means "to make your own butter," but idiomatically means to break the ice and create a relaxed atmosphere. It's often used in social gatherings to describe someone who takes the initiative to start conversations.
6. "Cracker la glace" (Belgian French): Similar to the English phrase, this Belgian idiom means to break the ice and create a more relaxed atmosphere. It's often used in social and business settings to describe the act of building rapport.
7. "Maken een opening" (Dutch): This phrase literally means "to make an opening," but idiomatically means to break the ice and create a connection with someone. It's often used in social and business settings to describe the act of building relationships.
8. "Romper el hielo" (Portuguese): Similar to the Spanish phrase, this Portuguese idiom means to break the ice and create a more relaxed atmosphere. It's often used in social gatherings to describe someone who takes the initiative to start conversations.
9. "Kholodit' led" (Russian): This phrase literally means "to melt the ice," but idiomatically means to break the ice and create a more relaxed atmosphere. It's often used in social and business settings to describe the act of building rapport.
10. "Gondermek" (Turkish): This phrase literally means "to send" or "to dispatch," but idiomatically means to break the ice and create a connection with someone. It's often used in social gatherings to describe someone who takes the initiative to start conversations.
1. Chinese (Mandarin): 打破沉默 (Dǎ pò chénmò)
   - Meaning: "Break the silence."
   - Explanation: This phrase is used to describe the act of initiating conversation when everyone is silent, which is similar to breaking the ice in English.

2. French: Mettre les pieds dans le plat
   - Meaning: "To put one's foot in the dish."
   - Explanation: This expression is used when someone speaks out, often in an awkward situation, to break the tension or silence. It can be seen as a more forceful or abrupt way to break the ice.
## we shortened the formatting after this point
3. Spanish: Romper el hielo
4. Japanese: 冷めた空気をほぐす (Tsumetai kūki wo hugasu)
5. Italian: Spezzare il ghiaccio
6. Arabic: كسر الجليد (Kasr al-jalid)
7. German: Die erste Hürde nehmen
8. Russian: Разговорить (razgovorit')
9. Portuguese: Quebrar o gelo
10. Swedish: Bryta is
Correct Answer: 5/10 ❌ 
The 2, 3, 6-9 are wrong. Lower results, but the time is still 3x faster (5s vs 16s).
Correct Answer: 6/10 ✅ 
Good formatting. 2, 4, 7, 8 are wrong, but the multilingual results are over 50%, and Qwen knows many Asian languages.

Prompt 4:

Write me code for the snake game. 
Give me a score counter, 
an option to restart on C and quit on Q.

make the resolution 500x500

LLama 3 70B output

Qwen2 output

snake game with a score counter, programmed by llama 3snake game, programmed by chat gpt 3.5
Correct Answer: ✅ 
The ratio is certainly not square, yet the game plays well.
Correct Answer: ✅ 
The game has a losing screen and a square screen, so we say it is better.

Prompt 4:

Analyze the following image:
grand canyon with a river

LLama 3 70B output

ChatGPT-3.5 output

Clever trick!
You still have 4 marbles, 
but they're no longer in the cup 
because you turned it upside down! 

They're probably scattered 
around on the floor or counter now!
You still have 4 marbles in the cup, 
even though it is now upside down and
in the freezer
Correct Answer: ✅ 
Trick question deserves a trick answer! 
Good understading of nuance.
Correct Answer: ❌
Even Zero-shot Chain of Thought couldn't save it in tests.

Additional Features

Qwen 2 was thoroughly studied on the safety of different query types (illegal activity, fraud, etc.) across multiple languages. Interestingly, the Qwen 2 72B model performed competitively on safety compared to the well-known GPT-4.

Additionally - Qwen 2 was studied in long-form context performance with Needle in a Haystack test. The model extracts details from text within 128K tokens with no blindspots.

LLama 3 is 3x the speed of Qwen 2, which tends to be quite notable during the more complicated tasks like coding (snake took Qwen 2 23s to accomplish, and LLama 3 did it in 7s.

Conclusion

Qwen 2 AI model beats LLama 3 in benchmarks, but practical tests are pretty close, and LLama 3 is 3x the speed of Qwen 2. Most likely Qwen 72B Instruct will find its place in the research community with its higher input window and good comprehension of the whole 128K tokens.

Get API Key

Pricing

The Pricing model is given in AI/ML API tokens. As both models are open source, setting it up and maintaining locally would have different pricing.

1k AI/ML Tokens LLama 3 70B Chat Qwen 2 72B Instruct
Input price $0.00117 $0.00117
Output price $0.00117 $0.00117

Qwen 2 pricing is exactly the same as LLama 3 pricing as both models are set up and accessible by AI/ML API.

Get API Key

Compare for yourself

As we've covered some strengths, you might want to get your hands on the models. The snippet below provides a starting point to test and compare two language models, Llama 3 and Qwen 2 using our API Key. Test it out for yourself to know how they fare in your specific project's use case.

import openai

def main():
    client = OpenAI(
      api_key='<YOUR_API_KEY>',
      base_url="https://api.aimlapi.com",
    )

    # Specify the two models you want to compare
    model1 = 'meta-llama/Llama-3-70b-chat-hf'
    model2 = 'Qwen/Qwen2-72B-Instruct'
    selected_models = [model1, model2]

    system_prompt = 'You are an AI assistant that only responds with jokes.'
    user_prompt = 'Why is the sky blue?'
    results = {}

    for model in selected_models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {'role': 'system', 'content': system_prompt},
                    {'role': 'user', 'content': user_prompt}
                ],
            )

            message = response.choices[0].message.content
            results[model] = message
        except Exception as error:
            print(f"Error with model {model}:", error)

    # Compare the results
    print('Comparison of models:')
    print(f"{model1}: {results.get(model1, 'No response')}")
    print(f"{model2}: {results.get(model2, 'No response')}")

if __name__ == "__main__":
    main()

Conclusion

LLama holding its ground against Qwen 2 is not as unexpected, considering what a titan of open source it is. The tokens per second output results are three times lower on Qwen, yet the complex tasks are solved incrementally better, and the context window difference is game-changing.

Regardless of the model you select, you will be amazed by its features. Use LLama 3 if you feel confident after this comparison. Pick Qwen 2 if the upgraded context window convinced you.

You can see what new models are coming to our lineup here, to keep your project's arsenal up-to-date.

Get API Key
Testimonials

Our Clients' Voices