Deepseek R1 vs GPT o1-preview

Benchmarks and specs

Specs

Deepseek R1 marks a bold advancement in AI technology, delivering innovative features and a focus on precision. Meanwhile, GPT o1-preview represents OpenAI’s next leap forward, showcasing significant improvements in its versatility and reliability.

Explore our detailed comparison of GPT o1-preview with other leading AI models to gain deeper insights into its strengths and how it stacks up against the competition:

In this evolving AI landscape, Deepseek is positioning itself as a strong contender, challenging OpenAI's established presence by refining niche functionalities and emphasizing user adaptability. To help you decide which AI best meets your requirements, let’s delve into the strengths and weaknesses of Deepseek R1 and GPT o1-preview.

Specification	GPT o1-preview	Deepseek R1
Input Context Window	128K	128K
Maximum Output Tokens	65K	8K
Number of parameters in the LLM	X	671B
Knowledge cutoff	October 2023	X
Release Date	September 12, 2024	January 21, 2025
Output tokens per second	144	37.2

The key differences between GPT o1-preview and Deepseek R1 lie in output capacity, speed, and release date. Both models support a 128K input context window, but GPT o1-preview can generate up to 65K output tokens at a speed of 144 tokens/sec, while Deepseek R1 has a maximum output token limit of 8K and processes at 37.2 tokens/sec. Additionally, GPT o1-preview has a knowledge cutoff of October 2023 and was released on September 12, 2024, whereas Deepseek R1 has no specified knowledge cutoff and launched more recently on January 21, 2025.

The choice between these models depends on specific needs, such as token output capacity, processing speed, and recency of release for cutting-edge applications.

Benchmarks

This benchmark is the combination of official release notes for both models, as well as multiple open benchmarks.

Benchmark	GPT o1-preview	Deepseek R1
Undergraduate level knowledge MMLU	90.8	90.8
Graduate level reasoning GPQA	67	71.5
Code Human Eval	96	96.3
Cybersecurity CTFs	43.0	-
Math problem-solving MATH-500	92	97.3

Deepseek R1 outperforms GPT o1-preview in math (MATH-500: 97.3 vs. 92) and graduate reasoning (GPQA: 71.5 vs. 67), while both excel equally in undergraduate knowledge (MMLU: 90.8).

In coding, Deepseek R1 slightly leads (Human Eval: 96.3 vs. 96). However, GPT o1-preview stands out in cybersecurity (CTFs: 43.0, no data for Deepseek R1).

Each model's strengths suit specific tasks, such as math for Deepseek R1 and cybersecurity for GPT o1-preview.

Get API Key

Practical tests

We are rigorously testing AI models in challenging, real-world scenarios. Our evaluation focuses on key areas such as conversational ability, logical reasoning, mathematical problem-solving, and coding performance to assess the strengths and weaknesses of each model.

Some of the outputs will be edited down for shortness or clarity.

Prompt 1. Just Chatting:

Let’s begin with a familiar challenge: many are aware that AI models often struggle with tasks like counting specific letters in a word. Let’s see how GPT o1-preview and Deepseek R1 handle this task. (I intentionally misspelled the word)

How many letters 'r' are in the word strawbery?

o1-preview output

Deepseek R1 output

There are **2** instances of the letter 'r' in "strawbery."

The word "strawbery" contains the letter 'r' **2 times**.

Correct Answer:✅

Correct Answer: ✅ ‍

Prompt 2. Logical reasoning:

The model will be presented with a statement and several possible answers. Its task is to select the correct answer.

The medieval Arabs had many manuscripts of the ancient Greek. 
When needed, they translate them into Arabic. 
Medieval Arab philosophers were very interested in Aristotle's Theory of Poetry, 
which was obviously not shared by Arab poets, 
because a poet interested in it must want to read Homer's poems. 
Aristotle himself often quotes Homer's poems. 
However, Homer's poems were not translated into Arabic until modern times.

Which of the following options, if true, strongly supports the above argument? 
Please only return the letter in front of your predict correct option, A, B, C or D.

A. Some medieval Arab translators have manuscripts of Homer poems in ancient Greek.

B. Aristotle's Theory of Poetry is often quoted and commented by modern Arab poets.

C. In Aristotle's Theory of Poetry, most of the content is related to drama, 
and medieval Arabs also wrote plays and performed them.

D. A series of medieval Arab stories, such as Arab Night, 
are very similar to some parts of Homer's epic.

o1-preview output

Deepseek R1 output

Correct Answer: ❌
GPT was unable to complete this task successfully

Correct Answer:✅
Deepseek successfully handled this task

Prompt 3. Creativity:

Write a short poem about friendship

o1-preview output

Deepseek R1 output

Life is a journey, a winding road,
A dance of moments, stories untold.
Echoes of laughter, shadows of tears,
Weaving together the days and the years.

Sunrise to sunset, paths intertwine,
Hearts find direction, souls seek a sign.
Through trials and triumphs, joy and strife,
We wander, we wonder—this mosaic is life.

A thread of dawn, a twilight sigh,  
The breath between a blink and sky—  
A dance of shadows, fleeting, warm,  
A tempest woven into form.  

We trace our maps in rivers’ veins,  
Through valleys carved by joy and pains,  
Each scar a star, each loss a tune,  
In chaos, life begins to bloom.  

The seasons turn, yet roots dig deep,  
Where storms and quiet cradle sleep.  
A firefly’s flicker, brief, unplanned—  
The universe cupped in a hand.  

So tread the edge where mysteries pour,  
For grace exists in what’s unsure.  
A heartbeat’s hum, the dark, the dawn—  
This borrowed light goes on… goes on.

Correct Answer: (4/5)✅
A solid, heartfelt poem but slightly predictable in its execution.

Correct Answer: (4.5/5) ✅
A vivid and thought-provoking poem that rewards deeper reflection, though slightly less approachable.
‍

It’s difficult to determine which poem is better, as each has its own unique strengths. Additionally, opinions on poetry are highly subjective and can vary from person to person. However, my ratings above provide a personal perspective on each piece.
‍

Prompt 4.Math:

I tested these models with several different tasks, and they successfully completed them all. However, GPT encountered difficulties with this particular one.

The expression 2⋅3⋅4⋅5+1=121, as multiplication is performed before addition. 
However, other values can be obtained by inserting parentheses into the expression. 
For example:
(2⋅(3⋅4))⋅(5+1)=144
How many distinct values can be obtained by inserting parentheses into 
2⋅3⋅4⋅5+1, without rearranging the terms?

o1-preview output

Deepseek R1 output

Here are the possible valid ways to parenthesize 
the expression and their corresponding values:

1. **Original expression**:
   \((2 \cdot 3 \cdot 4 \cdot 5) + 1 = 121\)

...

5. **Grouping \(2 + 3\) first**:
   \(((2 + 3) \cdot 4 \cdot 5) + 1 = 101\)

...

So, there are **7 different values** obtainable.

\(\boxed{7}\)

Deepseek R1 output
...

3. **Adding 1 to the product of the last two terms (4 and 5)**:
   \[
   (2 \cdot 3) \cdot (4 \cdot 5 + 1) = 6 \cdot 21 = 126
   \]
   This can also be grouped in different ways, but the result remains 126.

4. **Adding 1 to the product of the last three terms (3, 4, and 5)**:
   \[
   2 \cdot (3 \cdot 4 \cdot 5 + 1) = 2 \cdot 61 = 122
   \]

By systematically analyzing all possible ways to insert parentheses, 
we find that the different values obtained are 121, 144, 126, and 122. 
Each of these values corresponds to adding 1 at different stages 
of the multiplication process. 

Thus, the total number of distinct values that can be obtained is \(\boxed{4}\).

Correct Answer: ❌
GPT failed due to an error in its reasoning

Correct Answer: ✅
Deepseek completed the task successfully

Coding Block: In this section, we will explore several tasks that programmers commonly encounter.

Prompt 5. Algorithms:

Given two sorted arrays nums1 and nums2 of size m and n respectively,
return the median of the two sorted arrays. 
The overall run time complexity should be O(log (m+n))

Example:
Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

o1-preview output

Deepseek R1 output

def findMedianSortedArrays(nums1, nums2):
    m, n = len(nums1), len(nums2)
    
    # Ensure nums1 is the smaller array to minimize the binary search range
    if m > n:
        nums1, nums2, m, n = nums2, nums1, n, m

    # Handle the case where both arrays are empty
    if n == 0:
        raise ValueError("Both arrays are empty.")

    imin, imax = 0, m
    half_len = (m + n + 1) // 2

    while imin <= imax:
        i = (imin + imax) // 2  # Partition index for nums1
        j = half_len - i        # Partition index for nums2

        # Edge cases where partition is at the extreme ends
        maxLeftX = nums1[i - 1] if i > 0 else float('-inf')
        minRightX = nums1[i] if i < m else float('inf')

        maxLeftY = nums2[j - 1] if j > 0 else float('-inf')
        minRightY = nums2[j] if j < n else float('inf')

        # Check if we have found the correct partition
        if maxLeftX <= minRightY and maxLeftY <= minRightX:
            # Compute the median based on the total length parity
            if (m + n) % 2 == 0:
                return (max(maxLeftX, maxLeftY) + min(minRightX, minRightY)) / 2.0
            else:
                return max(maxLeftX, maxLeftY)
        elif maxLeftX > minRightY:
            # Too far on the right side for partitionX, move left
            imax = i - 1
        else:
            # Too far on the left side for partitionX, move right
            imin = i + 1

    # If we reach here, the input arrays were not sorted as expected
    raise ValueError("Input arrays are not sorted or invalid.")

def findMedianSortedArrays(nums1, nums2):
    # Ensure nums1 is the smaller array
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1
    m, n = len(nums1), len(nums2)
    low, high = 0, m
    total = m + n
    while low <= high:
        i = (low + high) // 2
        j = (total + 1) // 2 - i
        
        max_left_A = nums1[i-1] if i > 0 else float('-inf')
        min_right_A = nums1[i] if i < m else float('inf')
        max_left_B = nums2[j-1] if j > 0 else float('-inf')
        min_right_B = nums2[j] if j < n else float('inf')
        
        if max_left_A <= min_right_B and max_left_B <= min_right_A:
            if total % 2 == 1:
                return max(max_left_A, max_left_B)
            else:
                return (max(max_left_A, max_left_B) + min(min_right_A, min_right_B)) / 2.0
        elif max_left_A > min_right_B:
            high = i - 1
        else:
            low = i + 1
    return 0.0  # This return is a fallback and should theoretically never be reached

snake game with a score counter, programmed by gpt 4o

Correct Answer: (5/5)✅
GPT delivered an excellent solution, ranking among the best

Correct Answer: (4.5/5) ✅
Deepseek has provided a near perfect solution

Prompt 6. Refactoring and optimization:

This function, form_valid, processes a valid form 
submission for a purchase in a Django view. Optimize and refactor it.

def form_valid(self, form):
        email = form.cleaned_data['email']
        tel = form.cleaned_data['tel']
        country = form.cleaned_data['country']
        city = form.cleaned_data['city']
        street = form.cleaned_data['street']
        user = self.request.user
        products = Basket.objects.values_list(
		'product', 
		flat=True
        ).filter(user=self.request.user)
        total_price = Basket.objects.filter(
        	user=self.request.user
        ).aggregate(
            total_price=Sum(F('product__price') * F('quantity'))
        )
        purchase = Purchase.objects.create(
            user=user, 
            total_price=total_price.get('total_price'), 
            email=email, 
            tel=tel,
            country=country,
            city=city, 
            street=street
        )
        purchase.products.add(*products)
        Basket.objects.filter(user=self.request.user).delete()
        return redirect('history_purchases')

o1-preview output

Deepseek R1 output

from django.db import transaction
from django.db.models import F, Sum

def form_valid(self, form):
    user = self.request.user
    # Extract form data
    data = {field: form.cleaned_data[field] for field in ('email', 'tel', 'country', 'city', 'street')}
    
    # Fetch basket items with related products in a single query
    basket_items = Basket.objects.filter(user=user).select_related('product')
    
    # Calculate total price in the database
    total_price = basket_items.aggregate(
        total_price=Sum(F('product__price') * F('quantity'))
    )['total_price'] or 0  # Handle None total_price
    
    with transaction.atomic():
        # Create the purchase record
        purchase = Purchase.objects.create(
            user=user,
            total_price=total_price,
            **data
        )
        # Add products to the purchase
        purchase.products.add(*[item.product for item in basket_items])
        # Clear the user's basket
        basket_items.delete()
    
    return redirect('history_purchases')

from django.db import transaction
from django.db.models import Sum, F
from django.shortcuts import redirect

def form_valid(self, form):
    user = self.request.user
    with transaction.atomic():
        # Get the user's basket items in a single queryset
        user_baskets = Basket.objects.filter(user=user)
        
        # Extract product IDs and calculate total price using the same queryset
        products = user_baskets.values_list('product', flat=True)
        total_price = user_baskets.aggregate(
            total_price=Sum(F('product__price') * F('quantity'))
        )['total_price'] or 0  # Default to 0 if no items to avoid None
        
        # Create the Purchase instance with form data
        purchase = Purchase.objects.create(
            user=user,
            total_price=total_price,
            **{field: form.cleaned_data[field] for field in [
                'email', 'tel', 'country', 'city', 'street'
            ]}
        )
        
        # Add products and clear the basket
        purchase.products.add(*products)
        user_baskets.delete()
    
    return redirect('history_purchases')

Correct Answer: (4/5)✅
Pluses:
- Fetching related product objects efficiently minimizes database queries when accessing product attributes (e.g., product__price).
- The calculation of total_price is kept outside the transaction block, reducing complexity within the transaction.

‍Minuses:
- [item.product for item in basket_items] creates a list in memory, which is unnecessary since the IDs could be fetched directly
- Repeated logic (Extracting form data and basket products could be organized more compactly)

Correct Answer: (4/5) ✅
Pluses:
- Using values_list('product', flat=True) directly fetches product IDs from the basket without loading full objects into memory
- The transaction block encapsulates all database operations, making it easier to identify the critical section of the code.
- Extracting and passing form data is concise and integrated with the Purchase.objects.create statement
‍
Minuses:
- The code doesn't use select_related, which may result in additional queries if product is accessed (e.g., for its price).
- The compact nature sacrifices a bit of clarity, especially for beginners, since all logic is within the transaction block.

Prompt 7. HTML, CSS:

Create an adaptive image slider using HTML and CSS. 
Ensure that the slider is fully responsive, 
adjusts to different screen sizes, and includes basic
navigation controls (such as arrows or dots). 
Avoid using JavaScript for this task.

o1-preview output

Deepseek R1 output

Correct Answer: (5/5)✅
GPT provided a solid, functional solution with no bugs

Correct Answer: (4/5) ✅
Deepseek added buttons, but they don't function effectively

Get API Key

Pricing

1k AI/ML Tokens	GPT o1-preview	Deepseek R1
Input price	$0.01575	$0.00061
Output price	$0.063	$0.00241

Compare for yourself

While we've highlighted some strengths, the best model for your specific task depends on your needs. The snippet below provides a starting point to test and compare two language models, GPT o1-preview and Deepseek R1. Play around with different prompts and see which model delivers the most relevant and impressive results for you!

import openai

def main():
    client = OpenAI(
      api_key='<YOUR_API_KEY>',
      base_url="https://api.aimlapi.com",
    )

    # Specify the two models you want to compare
    model1 = 'o1-preview'
    model2 = 'deepseek/deepseek-r1'
    selected_models = [model1, model2]

    system_prompt = 'You are an AI assistant that only responds with jokes.'
    user_prompt = 'Why is the sky blue?'
    results = {}

    for model in selected_models:
        try:
            response = client.chat.completions.create(
                model=model,
                messages=[
                    {'role': 'system', 'content': system_prompt},
                    {'role': 'user', 'content': user_prompt}
                ],
            )

            message = response.choices[0].message.content
            results[model] = message
        except Exception as error:
            print(f"Error with model {model}:", error)

    # Compare the results
    print('Comparison of models:')
    print(f"{model1}: {results.get(model1, 'No response')}")
    print(f"{model2}: {results.get(model2, 'No response')}")

if __name__ == "__main__":
    main()

‍

Conclusion

Test	GPT o1-preview	Deepseek R1
1. Chatting	✅	✅
2. Logical reasoning	❌	✅
3. Creativity	(4/5) ✅	(4.5/5) ✅
4. Math	❌	✅
5. Algorithms	(5/5) ✅	(4.5/5) ✅
6. Refactoring and optimization	(4/5) ✅	(4/5) ✅
7. HTML, CSS	(5/5) ✅	(4/5) ✅

Strengths and Weaknesses of Each Model

GPT o1-preview

Strengths:

Logical Reasoning: Demonstrates strong capabilities in solving complex reasoning problems, although occasional errors occur in particularly challenging tasks.
Creativity: Produces detailed and engaging creative outputs, such as heartfelt poems, with a focus on vivid imagery.
Coding: Delivers high-quality algorithmic solutions and efficient refactoring with attention to database optimization.
HTML, CSS: Provides functional, bug-free designs for HTML and CSS tasks, offering reliable solutions.

Weaknesses:

Math Problem-Solving: Struggles with certain advanced math problems, leading to incorrect conclusions.
Clarity in Refactoring: Occasionally sacrifices clarity in more concise implementations, which may be less beginner-friendly.

Deepseek R1

Strengths:

Math Problem-Solving: Excels in handling advanced mathematical tasks, showcasing strong reasoning skills..
Graduate-Level Logic: Outperforms in tasks requiring nuanced logical reasoning, such as GPQA benchmarks.
Coding Efficiency: Uses memory-optimized approaches in database interactions, offering clean and effective solutions.
Creativity: Generates thought-provoking, reflective creative outputs, emphasizing depth and originality.

Weaknesses:

Web Development: Adds features like buttons but occasionally fails in their implementation or functionality.

Best Use Cases

When to Use GPT o1-preview:

Logical and Reasoning Tasks: Ideal for riddles, logical challenges, and solving complex reasoning problems.
Algorithmic Development: Best suited for high-performance coding tasks requiring optimized, functional solutions.
Creative Writing: Excels in generating rich, detailed content for storytelling, poetry, or descriptive tasks.
Web Design: Reliable for producing clean, bug-free web solutions.

When to Use Deepseek R1:

Math Problem-Solving: A top choice for advanced mathematical reasoning tasks and problem-solving.
Graduate-Level Logic: Perfect for nuanced tasks requiring deep logical reasoning.
Creative Outputs: Ideal for generating reflective, thought-provoking content with depth.
Efficient Coding: Suitable for tasks requiring memory-optimized and concise implementations.

GPT o1-preview shines in logical reasoning, algorithmic tasks, and creative writing, making it the go-to choice for complex, performance-critical projects. Deepseek R1, on the other hand, leads in advanced math, graduate-level logic, and memory-efficient coding, though it occasionally falls short in web development polish and.

Choosing the right model ultimately depends on your project’s priorities, whether it’s creative output, logical challenges, or coding efficiency.

Get API Key

Deepseek R1 vs GPT o1-preview

Access both with one API

Benchmarks and specs

Specs

Benchmarks

Practical tests

Prompt 1. Just Chatting:

o1-preview output

Deepseek R1 output

Prompt 2. Logical reasoning:

o1-preview output

Deepseek R1 output

Prompt 3. Creativity:

o1-preview output

Deepseek R1 output

Prompt 4.Math:

o1-preview output

Deepseek R1 output

Prompt 5. Algorithms:

o1-preview output

Deepseek R1 output

Prompt 6. Refactoring and optimization:

o1-preview output

Deepseek R1 output

Prompt 7. HTML, CSS:

o1-preview output

Deepseek R1 output

Pricing

Compare for yourself

Conclusion

Strengths and Weaknesses of Each Model

GPT o1-preview

Strengths:

Weaknesses:

Deepseek R1

Strengths:

Weaknesses:

Best Use Cases

When to Use GPT o1-preview:

When to Use Deepseek R1:

Deepseek R1
vs
GPT o1-preview