Qwen Max 2025-09-23, developed by Alibaba, is a large-scale Mixture-of-Experts (MoE) language model optimized for chat, coding, and complex problem-solving tasks. It employs a Transformer-based MoE architecture with expert routing, activating a subset of parameters per token to balance performance and compute efficiency. Pre-trained on over 20 trillion tokens from diverse multilingual and domain-specific corpora, Qwen Max supports deep reasoning and advanced text generation.
Technical Specifications
Architecture: Transformer-based Mixture-of-Experts (MoE) with selective expert activation per input token
Total Parameters: Not explicitly specified but aligned with large-scale MoE designs
Pre-training Data: More than 20 trillion tokens, including extensive English and Chinese textual data and various major world languages
Context Window: 262K tokens native
Language Support: Primarily English and Chinese
Training Methods: Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF)
Intended Use: Chat interactions, code generation, and complex question answering
Key Capabilities
Efficient MoE architecture enabling strong performance with reduced computational cost
Supports bilingual proficiency in English and Chinese for broad application scenarios
Excels in coding tasks, problem-solving, and conversational AI, facilitating code generation and complex reasoning
Employs advanced fine-tuning strategies to enhance generation quality and alignment with human preferences
Performance Metrics
Usage
Code Samples:
The model is available on the AI/ML API platform as "Qwen Max 2025-09-23" .