Stable Audio generates high-quality audio from text prompts with innovative features like audio transformation and extensive creative control.
Stable Audio is an advanced audio generation model designed to create high-quality audio tracks from textual prompts.
Key Features:
The model is intended for musicians, sound designers, and developers looking to create music, sound effects, or ambient sounds for various applications such as games, films, or interactive media.
Stable Audio primarily supports English for text prompts but can process multilingual inputs depending on the context of the prompt.
Stable Audio employs a latent diffusion model architecture optimized for audio generation. It uses a combination of a highly compressed autoencoder for efficient representation of audio waveforms and a diffusion transformer (DiT) that excels in manipulating data over long sequences.
The model was trained on a diverse dataset sourced from the AudioSparx music library, which includes over 800,000 audio files encompassing music, sound effects, and single-instrument stems.
Stable Audio has demonstrated impressive performance metrics:
The model is available on the AI/ML API platform as "Stable Audio" .
Detailed API Documentation is available here.
Stability AI emphasizes ethical considerations in AI development by promoting transparency regarding the model's capabilities and limitations. The organization ensures that all training data respects copyright laws and provides options for artists to opt out of data usage.
Stable Audio is available under a commercial license that allows both research and commercial usage rights while ensuring compliance with ethical standards regarding creator rights.
Get Stable Audio API here.