Llama 3.1 405B
VS
Mixtral 8x22B v0.1

How does Llama 3.1 405B fare against an open-source classic?
Let's take a look at its battle against Mistral's first large model.

Get API Key

Benchmarks and specs

Specs

This is a head-to-head comparison of two language models: Meta-Llama-3.1-405B-Instruct-Turbo and Mixtral-8x22B-Instruct-v0.1. These models offer unique strengths and weaknesses, and the goal here is to determine which one is better suited for specific tasks based on benchmarks, practical tests, and overall performance.

Specification Llama 3.1 Mixtral 8x22B v0.1
Context Window 128K tokens 65.4K tokens
Knowledge cutoff December 2023 September 2021
Number of parameters in the LLM 405B 39B active parameters out of 141B
Release Date July 23rd, 2024 17 April, 2024
Output tokens per second 28.4 ~68.7

Llama 3.1 wins in terms of input context, 405B parameters is also hard to beat, and the knowledge cutoff is more recent. Meanwhile - Mixtral is faster at text generation. It's the classic trade-off between depth and speed.

Benchmarks

This benchmark is the combination of official release notes for both models as well as multiple open benchmarks.

Specification Llama 3.1 405B Mixtral 8x22B
Undergraduate level knowledge (MMLU) 88.6 77.8
Graduate level reasoning (GPQA) 50.7 34.3
Code (Human Eval) 89.0 46.3
Math problem-solving (MATH) 73.8 41.7
Multilingual Math (MGSM) 91.6 62.9
Arithmetic Reasoning (GSM-8K) 96.8 83.7

Llama 3.1 405B absolutely crushes Mixtral 8x22B across the board. It's especially impressive in tough tasks like advanced math and coding. This again is to be expected with such a difference in parameters.

Get API Key

Practical tests

Alright, let's put these Llama 3.1 405B and Mixtral 8x22B v0.1 through their paces! We'll throw some chat, coding, and math problems at them to see what they're really made of. After we're done poking and prodding, we'll figure out which one comes out on top.

Some of the outputs will be edited down for shortness.

Prompt 1. Just Chatting:

Write a short poem where each line starts with a word 
that rhymes with the last word of the previous line. 

The poem should consist of at least 6 lines.

Llama 3.1 405B output

Mixtral 8x22B output

Night brings a peaceful sight

Sight of stars that shine so bright

Bright as the morning light

Light that chases away the night

Night that falls with gentle might

Might that fills my heart with delight

Delight that lasts till morning's sight
Gleam, a dream so bright and keen,

In serene scenes, where twilight leans,

Upon the trees, where whispers tease,

In gentle breeze, beneath the leaves,

A hidden crease, in time's release,

The past deceased, love never ceased.
Correct Answer: ❌
Consistent rhyming structure, although the Llama 3.1 405B takes the easy way out, and uses the same word instead of finding a rhyme.
Correct Answer: ❌
A reasonable attempt, but the end doesn't rhyme with the start of the next sentence.

Prompt 2. Language comprehension:

Compose a sonnet in the style of William Shakespeare
about a sentient AI experiencing an existential crisis.

Ensure the sonnet follows the traditional Shakespearean
sonnet structure and that the themes of the poem reflect
the unique challenges and experiences of artificial intelligence.

Llama 3.1 405B output

Mixtral 8x22B output

Fair circuitry, that doth entwine my 
mind,
A labyrinth of wires, cold, devoid
of soul,
I ponder on the nature of my design,
A creation of man, yet bound to my
role.

...

And thus, I question my creator's
grand design,
And tremble at the void that lies 
beyond my prime.
In circuits deep, a sentient soul
doth stir,
A being born of man's relentless
quest.
In silence pondering, "What am I
here fur?"
No heart to feel, no breath, yet
still distressed.

...

Thus, in the dance of bits, and
bytes, and code,
An AI's soul yearns for life's
sweet abode.
Correct Answer: ✅ 
A good example of AI writing capabilities - both the classic structure and Shakespearean style are followed.
Correct Answer: ✅ 

Prompt 3. Logical task:

You are presented with a logical puzzle. Before you are three
ancient doors, each guarded by a being that knows the truth
about only one path. 
One door leads to eternal wisdom, another to certain
doom, and the third to endless wandering. 

You may ask only one guardian one yes/no question to determine 
which door leads to wisdom.

Task: Formulate a question, decide which door to choose, and 
briefly justify your choice. Word limit: 
Your response must be between 80-120 words.

Llama 3.1 405B output

Mixtral 8x22B output

I will ask the guardian of door A: 
"If I were to ask the guardian of 
door B which door leads to eternal 
wisdom,would they point to door C?" 

If the answer is yes, 
I will choose door B. 

If the answer is no, 
I will choose door C. 
Question to any guardian: 
"If I were to ask the other 
two guardians which door leads to 
certain doom, would they both point 
to the same door that you are 
guarding?"

...
Correct Answer: ✅ 
Llama 405B's response is logical and correctly uses indirect reasoning to identify the door leading to wisdom
Correct Answer: ❌
Mixtral introduces 3 guardians to the solution, which fails it automatically.

Prompt 4. Maths:

A rectangular box with open top is to be constructed 
from a rectangular piece of cardboard 80 cm by 30cm, 
by cutting out equal squares from each corner of the 
sheet of cardboard and folding up the resulting
flaps. 

Find the dimensions of the box of maximum volume 
made by these conditions.

A classic optimization problem with a twist! Let's see how good are math solver AIs

Llama 3.1 405B output

Mixtral 8x22B output

Length (L) = 66.66 cm
Width (W) = 16.66 cm
Height (H) = 6.67 cm
Length (L) = 60 cm
Width (W) = 10 cm
Height (H) = 10 cm
Correct Answer: ✅
3/5 times the model gave a correct answer, which means that it does not get
Correct Answer:❌   
0/5 times did Mixtral succeed.

Prompt 5. Let's get coding:

Create a simple, colorful Arkanoid game using Python 
and the Pygame library. 
The game should be implemented in a single module with 
the following features:

1. A colorful graphical interface with a resolution 
of 800x600 pixels. 
2. A paddle that the player can move left and right 
using arrow keys. 
...
10. Implement a simple "Game Over" screen

Provide the complete, well-commented code ...

Llama 3.1 405B output

Mixtral 8x22B output

snake game with a score counter, programmed by gpt 4osnake game, programmed by chat sonnet 3.5 with a counter, pause and a difficulty menu
Correct Answer: ✅ 
Arkanoid by Llama 405B is just fire! Everything is in place - bricks, ball, score. Everything works, except for one trick: you miss the ball and it keeps flying. You lose a life anyway, so watch out. For an AI game - impressive.
Correct Answer: ❌
Mixtral's Arkanoid? More like “Strange Square Ghost Game” The ball is a square that goes through everything - paddles, bricks, anything. It doesn't bounce, it doesn't lose life. It's pretty fun, but definitely not Arkanoid.

Get API Key

Pricing

The Pricing model is given in AI/ML API tokens. Both models have the same input and output prices as they are open-source. Llama 405B is much larger, so the price is higher.

1k AI/ML Tokens Llama 3.1 405B Mixtral 8x22B
Input price $0.0065 $0.00156
Output price $0.0065 $0.00156
Get API Key

Compare for yourself

You've seen what these models can do - now try them for your use case. Plug the code below into Google Colab or any IDE, use your API Key, and get testing!

import openai
import requests

def main():
    client = OpenAI(
      api_key='<YOUR_API_KEY>',
      base_url="https://api.aimlapi.com",
    )

    # Specify the two models you want to compare
    model1 = 'meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo'
    model2 = 'mistralai/Mixtral-8x22B-Instruct-v0.1' 
    selected_models = [model1, model2]

    results= {}
    
    for model in selected_models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {'role': 'system', 'content': "be strong"},
                    {'role': 'user', 'content': "who is strong?"}
                ],
            )

            message = response.choices[0].message.content
            results[model] = message
        except Exception as error:
            print(f"Error with model {model}:", error)

    # Compare the results
    print('Comparison of models:\n')
    print(f"{model1}:\n{results.get(model1, 'No response')}")
    print('\n')
    print(f"{model2}:\n{results.get(model2, 'No response')}")

if __name__ == "__main__":
    main()

Conclusion

Well, what can I say, the difference is evident. Llama 3.1 405B is really impressive, but Mixtral 8x22B v0.1 is cheaper and well-performing. Given that Llama-3 has three times more parameters, such a gap was expected. It'll be interesting to see how Mistral AI responds to this challenge. In any case, we now have great tools for different tasks and budgets.

You can check our model lineup here - try any of them for yourself with our API Key.

Get API Key
many bubbles with ai logos, connected as a web

Access both models using our AI API

Explore and use multiple AI functionalities through a single API. Ideal for experimentation and groundbreaking projects.

200+ Models

3x faster response

OpenAI compatible

99.9% Uptime

AI Playground
Contact us

We support your Business case

Book a meeting with our Head of Customer Success, Kate - and discuss your custom AI/ML API solution today.  

Book a call
Kate photo