DiscoLM Mixtral 8x7b (46.7B)
+
Techflow Logo - Techflow X Webflow Template

DiscoLM Mixtral 8x7b (46.7B)

Advanced text generation model with 46.7B parameters and MoE architecture.

API for

DiscoLM Mixtral 8x7b (46.7B)

DiscoLM Mixtral 8x7b API: A powerful text generation model with 46.7B parameters, optimized for efficiency and performance.

DiscoLM Mixtral 8x7b (46.7B)

Model Overview Card for DiscoLM Mixtral 8x7b

Basic Information
  • Model Name: DiscoLM Mixtral 8x7b
  • Developer/Creator: DiscoResearch, led by Björn Plüster
  • Release Date: December 11, 2023
  • Version: V2
  • Model Type: Text Generation
Description
Overview

DiscoLM Mixtral 8x7b is a state-of-the-art language model designed for advanced text generation tasks. It leverages a sparse mixture of experts (MoE) architecture to optimize performance and efficiency, making it suitable for a wide range of natural language processing (NLP) applications.

Key Features
  • Sparse Mixture of Experts (MoE) Architecture: Utilizes 8 groups of experts, totaling 46.7 billion parameters, but only 12.9 billion parameters per token for efficiency.
  • High Performance: Achieves top-tier benchmarks on various NLP tasks.
  • Multi-Language Support: Proficient in English, French, Spanish, Italian, and German.
  • Extended Context Length: Supports a context length of up to 32,768 tokens.
Intended Use

DiscoLM Mixtral 8x7b is designed for:

  • Text generation and completion
  • Conversational AI
  • Content creation
  • Language translation
  • Advanced NLP research
Language Support

The model supports multiple languages, including:

  • English
  • French
  • Spanish
  • Italian
  • German

Technical Details

Architecture

DiscoLM Mixtral 8x7b employs a sparse mixture of experts (MoE) architecture. This design allows the model to use only a subset of its total parameters for each token, balancing computational efficiency with high performance. The architecture is based on the Mixtral framework, optimized for causal language modeling.

Training Data

The model was fine-tuned on a diverse set of datasets, including:

  • Synthia: A synthetic dataset designed for general NLP tasks.
  • MethaMathQA: A dataset focused on mathematical problem-solving.
  • Capybara: A comprehensive dataset for conversational AI.
Data Source and Size

The training data encompasses a wide range of sources to ensure robustness and diversity. The exact size of the training data is not specified, but it includes substantial amounts of text from various domains to enhance the model's generalization capabilities.

Knowledge Cutoff

The model's knowledge is up-to-date as of December 2023.

Diversity and Bias

Efforts were made to include diverse datasets to minimize biases. However, as with any large language model, some biases may still be present due to the nature of the training data.

Performance Metrics
Key Performance Metrics
  • ARC (25-shot): 67.32
  • HellaSwag (10-shot): 86.25
  • MMLU (5-shot): 70.72
  • TruthfulQA (0-shot): 54.17
  • Winogrande (5-shot): 80.72
  • GSM8k (5-shot): 25.09
Comparison to Other Models

DiscoLM Mixtral 8x7b outperforms many contemporary models, including LLama 2 70B from Meta, in several benchmarks.

Speed

The model is optimized for efficient inference, leveraging its MoE architecture to reduce computational overhead.

Robustness

DiscoLM Mixtral 8x7b demonstrates strong generalization across diverse inputs and maintains high performance across different topics and languages.

Usage

Code Samples

Ethical Guidelines

The model should be used responsibly, considering potential biases and ethical implications. It is intended for research purposes and should not be used for harmful activities.

Licensing

DiscoLM Mixtral 8x7b is released under the Apache 2.0 license, allowing for both commercial and non-commercial use.

Try  
DiscoLM Mixtral 8x7b (46.7B)

More APIs

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.