What is the Qwen3 Embedding series?

The Qwen3 Embedding model series is the latest proprietary text embedding and reranking model family from Qwen. It offers models in sizes of 0.6B, 4B, and 8B for tasks like text retrieval, classification, clustering, and bitext mining.

What are the key features of the Qwen3 Embedding series?

Key features include: 1) Exceptional Versatility: State-of-the-art performance; the 8B model ranks #1 on the MTEB multilingual leaderboard. 2) Comprehensive Flexibility: A range of sizes, support for user-defined embedding dimensions (MRL), and task-specific instructions. 3) Multilingual Capability: Supports over 100 languages and programming languages.

What model types and sizes are available?

The series includes both Text Embedding and Text Reranking models. Each type is available in three sizes: 0.6B, 4B, and 8B parameters.

What is "MRL Support"?

MRL (Matryoshka Representation Learning) Support indicates that the embedding model allows for flexible, user-defined output dimensions for the final embedding vector, ranging from 32 up to the model's maximum (e.g., 2560 for the 4B model).

What does "Instruction Aware" mean?

It means the model supports customizing the input instruction prompt according to different tasks, languages, or scenarios. Using tailored instructions typically improves performance by 1% to 5%.

What are the specifications of the Qwen3-Embedding-4B model?

Model Type: Text Embedding. Supported Languages: 100+. Parameters: 4B. Context Length: 32k tokens. Embedding Dimension: Up to 2560 (supports user-defined dimensions from 32 to 2560).

How do I use the Qwen3 Embedding models?

You can use the models via: 1. **Sentence Transformers** (`sentence-transformers>=2.7.0`): Simplest for encoding and similarity. 2. **Transformers** (`transformers>=4.51.0`): For more control, requiring manual instruction formatting and pooling. 3. **vLLM** (`vllm>=0.8.5`): For high-throughput inference. 4. **Text Embeddings Inference (TEI)**: For easy deployment as an API service via Docker.

What is the recommended way to format instructions for queries?

We recommend creating a one-sentence task description. The format is: `Instruct: {task_description}\nQuery:{query}`. For best results, especially in multilingual contexts, write instructions in English. Documents for retrieval do not need an instruction.

How does the Qwen3 Embedding series perform on benchmarks?

The series achieves top-tier results: * **MTEB Multilingual**: The 8B model leads with a score of 70.58. * **MTEB English v2**: The 8B model scores 75.22. * **C-MTEB (Chinese)**: The 8B model scores 73.84. Performance scales with model size across tasks like classification, clustering, and retrieval.

Where can I find more details and how to cite this work?

For more details on benchmarks, hardware requirements, and inference, please refer to the official blog and GitHub repository. To cite, use the provided BibTeX entry referencing "Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models" (arXiv:2506.05176).

What is the Qwen3 Embedding series?

The Qwen3 Embedding model series is the latest proprietary text embedding and reranking model family from Qwen. It offers models in sizes of 0.6B, 4B, and 8B for tasks like text retrieval, classification, clustering, and bitext mining.

What are the key features of the Qwen3 Embedding series?

Key features include: 1) Exceptional Versatility: State-of-the-art performance; the 8B model ranks #1 on the MTEB multilingual leaderboard. 2) Comprehensive Flexibility: A range of sizes, support for user-defined embedding dimensions (MRL), and task-specific instructions. 3) Multilingual Capability: Supports over 100 languages and programming languages.

What model types and sizes are available?

The series includes both Text Embedding and Text Reranking models. Each type is available in three sizes: 0.6B, 4B, and 8B parameters.

What is "MRL Support"?

MRL (Matryoshka Representation Learning) Support indicates that the embedding model allows for flexible, user-defined output dimensions for the final embedding vector, ranging from 32 up to the model's maximum (e.g., 2560 for the 4B model).

What does "Instruction Aware" mean?

It means the model supports customizing the input instruction prompt according to different tasks, languages, or scenarios. Using tailored instructions typically improves performance by 1% to 5%.

What are the specifications of the Qwen3-Embedding-4B model?

Model Type: Text Embedding. Supported Languages: 100+. Parameters: 4B. Context Length: 32k tokens. Embedding Dimension: Up to 2560 (supports user-defined dimensions from 32 to 2560).

How do I use the Qwen3 Embedding models?

You can use the models via: 1. **Sentence Transformers** (`sentence-transformers>=2.7.0`): Simplest for encoding and similarity. 2. **Transformers** (`transformers>=4.51.0`): For more control, requiring manual instruction formatting and pooling. 3. **vLLM** (`vllm>=0.8.5`): For high-throughput inference. 4. **Text Embeddings Inference (TEI)**: For easy deployment as an API service via Docker.

What is the recommended way to format instructions for queries?

We recommend creating a one-sentence task description. The format is: `Instruct: {task_description}\nQuery:{query}`. For best results, especially in multilingual contexts, write instructions in English. Documents for retrieval do not need an instruction.

How does the Qwen3 Embedding series perform on benchmarks?

The series achieves top-tier results: * **MTEB Multilingual**: The 8B model leads with a score of 70.58. * **MTEB English v2**: The 8B model scores 75.22. * **C-MTEB (Chinese)**: The 8B model scores 73.84. Performance scales with model size across tasks like classification, clustering, and retrieval.

Where can I find more details and how to cite this work?

For more details on benchmarks, hardware requirements, and inference, please refer to the official blog and GitHub repository. To cite, use the provided BibTeX entry referencing "Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models" (arXiv:2506.05176).

Qwen Text Embedding v4 API

Name: Qwen Text Embedding v4 API
Brand: Alibaba Cloud

Qwen Text Embedding v4

Qwen Text Embedding v4 is the latest iteration in Alibaba Cloud’s high-performance embedding model series, engineered to transform textual inputs into dense, semantically rich vector representations.

Overview

Qwen Text Embedding v4 is a 4B-parameter dual-encoder model from the Qwen3 family, optimized specifically for dense embeddings and ranking tasks rather than general chat. It supports over 100 languages (including major programming languages) and is tuned for semantic search, retrieval, classification, clustering, and bitext mining in a single shared embedding space.

Technical Specifications

Architecture: Dense transformer-based encoder with symmetric contrastive fine-tuning
Context Length: 8,192 tokens
Capabilities: Multilingual semantic encoding, query/document alignment, cross-lingual retrieval, similarity ranking
Training Data: Curated corpus spanning technical documentation, academic papers, conversational logs, and web-scale multilingual text with rigorous deduplication and bias mitigation

Performance Benchmarks

MTEB (Massive Text Embedding Benchmark): Achieves top-tier performance among open and closed models, excelling in retrieval, classification, and clustering subtasks
Multilingual Alignment: Maintains >92% cross-lingual similarity fidelity on aligned sentence pairs across major language families

Quality Improvements

Semantic Density: Embeddings exhibit tighter intra-cluster cohesion and sharper inter-cluster separation compared to v3
Noise Resilience: Robust to input perturbations, formatting inconsistencies, and moderate grammatical errors
Bias Control: Integrated fairness-aware training reduces spurious correlations in gender, region, and domain-sensitive dimensions

API Pricing

$0.0735 / 1M tokens

New Features & Technical Upgrades

Qwen Text Embedding v4 introduces a suite of innovations focused on semantic fidelity, efficiency, and multilingual equity:

Key Features

High Semantic Fidelity: Captures fine-grained semantic relationships, even in complex or domain-specific phrasing.
Long-Context Awareness: Handles inputs up to 8K tokens—ideal for embedding full documents or detailed user queries.
Multilingual Robustness: Unified embedding space across languages enables cross-lingual retrieval without translation.
Optimized for Retrieval: Trained with contrastive and in-batch negative sampling for superior performance in similarity search.
Low Latency & High Throughput: Efficient inference pipeline suitable for real-time applications at scale.

Practical Impact

‍These enhancements translate into stronger RAG accuracy, more coherent document clustering, and reduced false positives in semantic search, especially in multilingual support portals, research knowledge bases, and cross-border enterprise analytics.

Code Sample

Comparison with Other Models

vs OpenAI text-embedding-3-large: Qwen v4 matches or exceeds OpenAI’s performance on MTEB while offering 8K context (vs 8K for OpenAI, but with lower cost and data residency flexibility). Unlike OpenAI, Qwen provides transparent licensing for commercial deployment and avoids data usage for model training.

vs Google’s textembedding-gecko: Qwen v4 provides better zero-shot retrieval scores on BEIR and avoids vendor lock-in through open-weight availability. Gecko integrates tightly with Vertex AI, while Qwen offers greater deployment flexibility across clouds and on-prem.

Example H2

Try it now