Qwen3-Next-80B-A3B Thinking

Qwen3-Next-80B-A3B Thinking integrates seamlessly into modern AI workflows with flexible deployment options including serverless, on-demand dedicated, and reserved monthly instances.

Overview

Qwen3-Next-80B-A3B Thinking is a cutting-edge reasoning-focused chat model designed for complex multi-step problem solving and chain-of-thought tasks. It outputs structured “thinking” traces by default and excels at tasks requiring deep analytical reasoning such as math proofs, code synthesis, logic, and agentic planning.

Technical Specifications

Qwen3-Next-80B-A3B Thinking is a sophisticated language model featuring 80 billion parameters, with only 3 billion actively engaged per token thanks to a sparse Mixture of Experts (MoE) architecture. It comprises 48 layers with a 2048-hidden dimension and employs a hybrid design combining gating mechanisms and advanced normalization techniques like RMSNorm. The model supports an expansive context window of 262K tokens, extensible up to 1 million tokens with specialized scaling methods, enabling superior long-context understanding. Trained with resource-efficient hybrid strategies, it achieves high performance in complex reasoning, math, coding, and multi-step problem solving, while maintaining low inference costs and high throughput, especially in tasks requiring deep analytical capabilities.

Performance Benchmarks

MMLU (General Knowledge): 78.5%
HumanEval (Code Generation): 82.1%
GSM8K (Mathematics): 91.2%
MT-Bench (Instruction Following): 84.3%

API Pricing

Input: $0.195

Output: $1.56

Key Features

Thinking Mode Optimization: Tailored for chain-of-thought and complex problem solving with longer, more detailed output traces
Sparse Activation: Only a fraction (3B of 80B) of parameters activated per token, enabling rapid inference and cost efficiency
Multi-token Prediction: Accelerates decoding by predicting multiple tokens at a time
Stable Long-form Reasoning: Designed for stability across long chains of reasoning and complex instructions
Agent Integration: Supports function calling and integration into agent frameworks requiring step-by-step analytic solutions
Multilingual & Multimodal: Strong multilingual understanding and support for diverse reasoning tasks internationally

Use Cases

Scientific research requiring deep hypothesis generation and data analysis
Engineering and mathematical problem solving, proofs, and complex code synthesis/debugging
Legal case analysis and detailed argument construction
Financial risk modeling and strategic business planning with transparent decision steps
Medical diagnosis assistance with reasoning transparency and detailed explanations
Long-context document analysis and retrieval-augmented workflows

Code Sample

Comparison with Other Models

vs Qwen3-32B: Qwen3-Next-80B-A3B activates only 3 billion parameters per token compared to Qwen3-32B's full activation, making it about 10 times more efficient in training and inference cost. It also delivers over 10 times faster output speed in long-context scenarios (beyond 32K tokens) while achieving higher accuracy on reasoning and complex tasks.

vs Qwen3-235B: Despite having fewer active parameters, Qwen3-Next-80B-A3B approaches the performance levels of the much larger 235 billion parameter Qwen3-235B, especially in instruction following and long-context reasoning. It offers a favorable balance of compute efficiency and high model quality suitable for production use.

vs Google Gemini-2.5-Flash-Thinking: The Qwen3-Next-80B-A3B Thinking variant outperforms Google Gemini-2.5-Flash-Thinking in chain-of-thought reasoning and multi-turn instruction tasks while maintaining substantially lower operational costs due to sparse activation and multi-token prediction capabilities.

vs Llama 3.1-70B: Qwen3-Next-80B-A3B offers better long-range context understanding and reasoning stability at much larger context windows (up to 1 million tokens scalable) compared to Llama 3.1-70B's shorter native window. The sparse MoE architecture also gives it better efficiency at scale.

Example H2

Try it now