2K
0.000026
Embedding

Textembedding-gecko-multilingual@001

Explore the textembedding-gecko-multilingual@001 model API, its architecture, training data, performance, and applications in NLP tasks.
Try it now

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 200 models to integrate into your app.
AI Playground image
Ai models list in playground
Testimonials

Our Clients' Voices

Textembedding-gecko-multilingual@001Techflow Logo - Techflow X Webflow Template

Textembedding-gecko-multilingual@001

Textembedding-gecko-multilingual@001 is a powerful multilingual text embedding model

Model Overview Card for Textembedding-gecko-multilingual@001

Basic Information

  • Model Name: textembedding-gecko-multilingual@001
  • Developer/Creator: Google
  • Release Date: April 30, 2024
  • Version: 001
  • Model Type: Text Embedding

Description

Overview

The textembedding-gecko-multilingual@001 model is a state-of-the-art text embedding model developed by Google, designed to convert textual data into numerical vector representations. It captures semantic meanings and relationships within the text, facilitating various natural language processing (NLP) tasks.

Key Features
  • Supports a maximum of 3,072 input tokens.
  • Outputs 768-dimensional vector embeddings.
  • Achieves superior performance on the Massive Text Embedding Benchmark (MTEB).
  • Utilizes a novel fine-tuning dataset (FRet) for enhanced query and passage generation.
  • Designed for multilingual support, covering a wide range of languages.
Intended Use
  • Semantic search
  • Text classification
  • Document retrieval
  • Clustering and recommendation systems
  • Outlier detection
Language Support

The model supports multiple languages, including but not limited to Arabic, Bengali, English, Spanish, French, Hindi, Chinese.

Technical Details

Architecture

The textembedding-gecko-multilingual@001 model is based on a dense vector representation architecture similar to that used in large language models (LLMs). It employs advanced deep learning techniques to generate embeddings that reflect the semantic context of the input text.

Training Data

The model was trained using a diverse dataset generated through a two-step process involving LLMs. The initial step involves generating queries and relevant passages, while the second step ranks these passages to create a fine-tuning dataset. This approach ensures a broad coverage of tasks and enhances the model's performance.

Data Source and Size

The training data comprises a large corpus of unlabeled passages. The diversity of the training data contributes significantly to the model's ability to understand and generate meaningful embeddings.

Knowledge Cutoff

The model's knowledge is current as of April 2024.

Diversity and Bias

The training data is designed to be diverse, which helps mitigate biases. However, as with any model, ongoing evaluation is essential to identify and address any potential biases that may arise from the training data.

Performance Metrics

The textembedding-gecko-multilingual@001 model exhibits impressive performance metrics, particularly when evaluated against the Massive Text Embedding Benchmark (MTEB). This benchmark is a comprehensive evaluation suite that encompasses seven categories of tasks across 56 individual datasets, allowing for a robust assessment of the model's capabilities.

Average Score on MTEB

The model achieves an average score of 66.31 with 768-dimensional embeddings. This score positions it as a leading contender among text embedding models, outperforming larger models (up to 7 times larger) and those with higher dimensional embeddings (up to 4096 dimensions) while maintaining a compact size of only 1.2 billion parameters.

Task-Specific Performance

The model excels in several core NLP tasks, achieving the following average scores:

  • Text Classification: 81.17
  • Semantic Textual Similarity: 85.06
  • Summarization: 32.63
  • Retrieval Tasks: 55.70
Zero-Shot Generalization

Remarkably, the model demonstrates strong zero-shot capabilities, particularly when trained solely on the synthetic FRet dataset. This indicates that it can effectively generalize to unseen tasks without prior exposure to specific datasets, outperforming several competitive baselines.

Usage

Code Samples

The model is available on the AI/ML API platform as "textembedding-gecko-multilingual@001".

API Documentation

Detailed API Documentation is available on the AI/ML API website, providing comprehensive guidelines for integration.

Ethical Guidelines

The development and deployment of the textembedding-gecko-multilingual model adhere to ethical guidelines that emphasize responsible AI usage. Developers are encouraged to consider the implications of embedding models in their applications, particularly concerning data privacy and potential biases.

Licensing

License Type: The textembedding-gecko-multilingual@001 model is currently not open-sourced, and its usage is subject to specific licensing agreements defined by Google. Users should review the terms of service and privacy policies associated with the model's deployment.

Try it now

The Best Growth Choice
for Enterprise

Get API Key