Ideal for coding, web navigation, command-line automation, and other agentic workflows, it blends powerful model capacity with flexible inference modes.
DeepSeek V3.1 Terminus is a state-of-the-art large-scale hybrid reasoning AI model designed for complex tasks involving chain-of-thought reasoning.
Model Overview
DeepSeek V3.1 Terminus is a state-of-the-art large-scale hybrid reasoning AI model designed for complex tasks involving chain-of-thought reasoning. This update refines DeepSeek's V3 line, focusing on enhanced stability, improved agent/tool workflows, and reliable multi-step reasoning capabilities. Ideal for coding, web navigation, command-line automation, and other agentic workflows, it blends powerful model capacity with flexible inference modes.
Technical Specifications
Model Type: Hybrid Mixture-of-Experts (MoE) large language model
Total Parameters: 671 billion
Active Parameters per Forward Pass: 37 billion
Hybrid Reasoning Modes: Supports "thinking" mode (complex internal reasoning with tool planning) and "non-thinking" mode (faster direct responses) within the same network
Context Window Size: Up to 128,000 tokens, enabling very long context and extended chain of thought
Agent Capabilities: Specialized integrated agents include Code Agent, Search Agent, Browse Agent, and Terminal Agent
Performance Benchmarks
MMLU-Pro Reasoning: Improved from 84.8 to 85.0 (Terminus)
GPQA-Diamond: 80.7
Humanity’s Last Exam: Significant jump from 15.9 to 21.7
LiveCodeBench: Marginal gain to 74.9
Codeforces Score: Slight variation around 2046
Software Engineering Verification: Improved from 66.0 to 68.4
Performance Benchmarks
Key Features
Huge Context Window: Supports up to 128,000 tokens of context, enabling the processing of extremely long documents and extended complex reasoning.
Enhanced Agent Integration: Optimized multi-step agent workflows with improved reliability in tool calling, including specialized agents for code, search, browsing, and terminal operations.
Efficiency Improvements: Reduced average token consumption by 20%–50% in reasoning mode, maintaining or improving output quality compared to previous versions.
Improved Language Consistency: Dramatically reduced issues with random character insertions and unwanted language mixing (Chinese/English), resulting in smoother output.
Superior Coding Performance: Advanced code generation capabilities, including creating complex applications and interacting with game engines like Godot without build errors.
Research synthesis and long-document summarization
Code Sample
Comparison with Other Models
vs GPT-4: GPT-4 is well-known for versatility and creativity, strong in general reasoning and dialogue quality; DeepSeek Terminus excels in agentic workflows and multi-step tool invocation efficiency with lower token costs.
vs Claude 4.1: Claude 4.1 leads in intuitive, creative multi-step reasoning and excels in smooth chain-of-thought tasks; DeepSeek Terminus matches closely in complex agentic workflows where tool integration and explicit planning are critical.
Vs DeepSeek R1: Terminus achieves comparable reasoning quality with faster response times and lower output token consumption.
Vs DeepSeek V3.1: Terminus improves language stability, reduces character glitches, and boosts agent/tool coordination.