High-performance embedding model with flexible dimensions and superior accuracy.

Text-embedding-3-large API provides top-tier text embeddings with customizable dimensions, delivering exceptional accuracy for complex applications.


Model Overview Card: text-embedding-3-large

Basic Information

  • Model Name: text-embedding-3-large
  • Developer/Creator: OpenAI
  • Release Date: January 25th, 2024
  • Version: text-embedding-3-large
  • Model Type: Text Embedding


  • Overview:text-embedding-3-large is a next-generation embedding model that offers superior performance and flexibility. It converts text into high-dimensional numerical representations that are highly effective for various machine learning tasks.
  • Key Features:
    • Top Performance: The highest performing embedding model with significant improvements over predecessors.
    • Flexible Embedding Size: Supports dimensions from 256 up to 3072, allowing for trade-offs between performance and resource usage.
    • Native Support for Shortening Embeddings: Developers can shorten embeddings without significant loss in conceptual representation.
  • Intended Use:
    • High-Performance Search: Optimal for applications requiring precise and fast search results.
    • Advanced Clustering: Suitable for sophisticated data analysis and clustering tasks.
    • Enhanced Recommendations: Provides accurate recommendations by understanding text similarities.
    • Robust Anomaly Detection: Efficiently identifies outliers in large datasets.
    • Detailed Diversity Measurement: Analyzes the diversity of large text corpora.
    • Accurate Classification: Highly effective in classifying complex text data.
  • Language Support:Offers improved support for multiple languages, making it suitable for global applications.

Technical Details

  • Architecture:Advanced transformer-based architecture designed for high-dimensional embeddings and superior performance.
  • Training Data:Trained on an extensive and diverse dataset to capture a wide array of linguistic nuances.
  • Data Source and Size:Includes billions of text entries, ensuring a comprehensive understanding of language.
  • Diversity and Bias:Ensures high diversity in training data to mitigate biases and enhance reliability.

Performance Metrics

  • Comparison to Other Models:
    • MIRACL Score: Increased from 31.4% (ada-002) to 54.9%.
    • MTEB Score: Improved from 61.0% (ada-002) to 64.6%.
  • Accuracy:Delivers top-tier accuracy across multiple benchmarks.
  • Speed:Optimized for faster processing times despite larger dimensionality.
  • Robustness:Maintains high performance across a variety of input types and contexts.

