Voice

Stable Audio

Discover Stable Audio by Stability AI, an advanced audio generation model that creates high-quality tracks from text prompts with innovative features.
Try it now

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 200 models to integrate into your app.
AI Playground image
Ai models list in playground
Testimonials

Our Clients' Voices

Stable AudioTechflow Logo - Techflow X Webflow Template

Stable Audio

Stable Audio generates high-quality audio from text prompts with innovative features like audio transformation and extensive creative control.

Model Overview Card for Stable Audio

Basic Information

  • Model Name: Stable Audio
  • Developer/Creator: Stability AI
  • Release Date: September 2023
  • Version: 1.0
  • Model Type: Audio Generation Model

Description

Overview:

Stable Audio is an advanced audio generation model designed to create high-quality audio tracks from textual prompts.

Key Features:

  • High-Quality Output: Generates audio at 44.1 kHz stereo, providing professional-grade sound quality.
  • Length Flexibility: Capable of producing tracks with coherent musical structures including intros, developments, and outros.
  • Diverse Sound Creation: Generates melodies, sound effects, and various audio styles, catering to musicians and sound designers.
Intended Use:

The model is intended for musicians, sound designers, and developers looking to create music, sound effects, or ambient sounds for various applications such as games, films, or interactive media.

Language Support:

Stable Audio primarily supports English for text prompts but can process multilingual inputs depending on the context of the prompt.

Technical Details

Architecture:

Stable Audio employs a latent diffusion model architecture optimized for audio generation. It uses a combination of a highly compressed autoencoder for efficient representation of audio waveforms and a diffusion transformer (DiT) that excels in manipulating data over long sequences.

Training Data:

The model was trained on a diverse dataset sourced from the AudioSparx music library, which includes over 800,000 audio files encompassing music, sound effects, and single-instrument stems.

  • Data Source and Size: The dataset is large and varied, ensuring a comprehensive understanding of different audio elements and styles.
  • Diversity and Bias: The training data was curated to respect creator rights with an opt-out option for artists. This approach helps minimize bias while ensuring diverse representation in the generated outputs.
Performance Metrics:

Stable Audio has demonstrated impressive performance metrics:

Metric Score
Quality Index High
Length of Generated Tracks up to 47s
Sampling Rate 44.1 kHz

Usage

Code Samples

The model is available on the AI/ML API platform as "Stable Audio" .

API Documentation

Detailed API Documentation is available here.

Ethical Guidelines

Stability AI emphasizes ethical considerations in AI development by promoting transparency regarding the model's capabilities and limitations. The organization ensures that all training data respects copyright laws and provides options for artists to opt out of data usage.

Licensing

Stable Audio is available under a commercial license that allows both research and commercial usage rights while ensuring compliance with ethical standards regarding creator rights.

Get Stable Audio API here.

Try it now

The Best Growth Choice
for Enterprise

Get API Key