2K
0.000026
Embedding
Active

Text-multilingual-embedding-002

Discover Text-multilingual-embedding-002 API, a powerful model for multilingual text embeddings, enhancing NLP applications across languages.
Try it now

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 200 models to integrate into your app.
AI Playground image
Ai models list in playground
Testimonials

Our Clients' Voices

Text-multilingual-embedding-002Techflow Logo - Techflow X Webflow Template

Text-multilingual-embedding-002

Multilingual embedding model for diverse NLP applications and languages

Model Overview Card for Text-multilingual-embedding-002

Basic Information

  • Model Name: Text-multilingual-embedding-002
  • Developer/Creator: Google Cloud
  • Release Date: March 2023
  • Version: 002
  • Model Type: Text Embedding

Description

Overview

Text-multilingual-embedding-002 is a state-of-the-art model designed to convert textual data into numerical vector representations, capturing the semantic meaning and context of the input text. It is particularly focused on supporting multiple languages, making it suitable for global applications.

Key Features
  • Supports over 100 languages
  • High-quality semantic embeddings
  • Fine-tuned for various NLP tasks
  • Efficient inference speed
  • Robust against diverse linguistic structures
Intended Use
  • Cross-lingual search engines
  • Multilingual chatbots
  • Sentiment analysis across languages
  • Language translation services
  • Content recommendation systems
Language Support

Text-multilingual-embedding-002 supports a wide range of languages, including but not limited to English, Spanish, French, Chinese, and Arabic, making it suitable for global applications.

The model is also perfect for cross-lingual applications for Clinical Documentation and Research. Learn more about this and other models and their applications in Healthcare here.

Technical Details

Architecture

The model is based on the Transformer architecture, which utilizes self-attention mechanisms to process and generate embeddings that capture contextual relationships between words in multiple languages.

Training Data

Text-multilingual-embedding-002 was trained on a diverse and extensive dataset that includes text from books, websites, and other multilingual sources. The training data encompasses approximately 1 billion sentences across various languages, ensuring a broad understanding of linguistic nuances.

Data Source and Size

The model was trained on a large-scale dataset. The diversity of the data contributes significantly to the model's ability to generalize across different languages and contexts.

Knowledge Cutoff

The model's knowledge is current as of March 2023.

Diversity and Bias

The training data includes a wide range of sources to minimize bias and improve robustness. However, like all models trained on large datasets, it may still reflect some inherent biases present in the data.

Performance Metrics

Massive Text Embedding Benchmark (MTEB)

The model's performance on the MTEB benchmark indicates high accuracy across multiple tasks, particularly in retrieval and classification scenarios. These metrics suggest that the model performs well in ranking relevant documents and retrieving information effectively from large datasets.

  • nDCG@10: 60.8
  • Recall@100: 92.4

The model has demonstrated a high level of robustness, effectively handling diverse inputs across different languages. It has been benchmarked against user-generated content (UGC) and has shown resilience in maintaining performance despite variations in language and structure.

Comparison with Other Models

Text-multilingual-embedding-002 has shown competitive performance against other leading multilingual embedding models. In the MTEB evaluation, it achieved:

  • Accuracy: 64.0 on average across various tasks.
  • Retrieval tasks: Strong performance, indicating its robustness in handling multilingual queries.

Text-multilingual-embedding-002 outperformed several models in the same category:

  • LaBSE (Language-agnostic BERT Sentence Embedding): 45.2
  • Cohere: 64.0
  • BGE (Best Generative Embedding): 64.2

Usage

Code Samples

The model is available on the AI/ML API platform as "text-multilingual-embedding-002".

API Documentation

Detailed API Documentation is available on the AI/ML API website, providing comprehensive guidelines for integration.

Ethical Guidelines

The development of Text-multilingual-embedding-002 adheres to ethical AI practices, focusing on transparency, fairness, and accountability.

Licensing

Text-multilingual-embedding-002 is available under commercial licensing, allowing for both commercial and non-commercial usage, subject to Google Cloud's terms of service.

Try it now

The Best Growth Choice
for Enterprise

Get API Key