Whisper

Whisper: Multilingual speech recognition model, robust, versatile, open-source.

Model Whisper

Basic Information

Model Name: Whisper

Developer/Creator: OpenAI

Release Date: September 2022 (original series), December 2022 (large-v2), and November 2023 (large-v3)

Model Type: Sequence-to-sequence ASR (automatic speech recognition) and speech translation model

Versions:

Size	Parameters	Relative speed
tiny	39 M	~32x
base	74 M	~16x
small	244 M	~6x
medium	769 M	~2x
large	1550 M	1x

Description

The Whisper models are primarily for AI research, focusing on model robustness, generalization, and biases, and are also effective for English speech recognition. The use of Whisper models for transcribing non-consensual recordings or in high-risk decision-making contexts is strongly discouraged due to potential inaccuracies and ethical concerns.

Key Features:

Multilingual capabilities, shows strong results in roughly 10 languages but have limited evaluation for other tasks like voice detection and speaker classification.
Robust to diverse accents and noisy environments.
Can be used for tasks such as speech transcription, translation, and generating subtitles.

Intended Use:

Intended for developers and researchers interested in incorporating speech-to-text capabilities into applications, supporting accessibility features, or conducting linguistic research.

Technical Details

Architecture:

The model utilizes a Transformer architecture that has been pre-trained on a mixture of supervised and unsupervised data.

Training Data:

The models are trained using 680,000 hours of audio and corresponding transcripts from the internet, with 65% being English audio and transcripts, 18% non-English audio with English transcripts, and 17% non-English audio with matching non-English transcripts, covering 98 languages in total.

Performance Metrics:

Research indicates that these models outperform many existing ASR systems. They show enhanced robustness to accents, background noise, and technical language, and provide zero-shot translation from multiple languages into English with nearly state-of-the-art accuracy in both speech recognition and translation.

Performance varies across languages, particularly suffering in low-resource or less commonly studied languages, and demonstrates variability in accuracy with different accents, dialects, and demographic groups. The models may also generate repetitive texts, a trait partly addressable through beam search and temperature scheduling techniques.

Knowledge cutoff:

Audio or text data used for training would not include information beyond mid-2022

Usage

Code Samples/SDK:

Tutorials: Speech-to-text Multimodal Experience in NodeJS

File Size

The maximum file size is limited to 2 GB.

Support and Community

Community Resources:

AIML API Discord

Support Channels:

Issues and contributions can be made directly through the GitHub repository.

Ethical Considerations

Ethical Guidelines: OpenAI provides guidance on responsible usage, emphasizing privacy and ethical use of AI technologies.
Bias Mitigation: Continuous efforts to reduce biases in speech recognition accuracy across different languages and accents.

Licensing

License Type: Released under the MIT license, allowing for commercial and non-commercial use.

‍
References

https://arxiv.org/abs/2212.04356

Try it now

Whisper

AI Playground

Our Clients' Voices

Whisper

Model Whisper

Basic Information

Description

Key Features:

Intended Use:

Technical Details

Architecture:

Training Data:

Performance Metrics:

Knowledge cutoff:

Usage

Support and Community

Community Resources:

Support Channels:

Ethical Considerations

Licensing

200+ AI Models

The Best Growth Choice
for Enterprise

Whisper

AI Playground

Our Clients' Voices

Whisper

Model Whisper

Basic Information

Description

Key Features:

Intended Use:

Technical Details

Architecture:

Training Data:

Performance Metrics:

Knowledge cutoff:

Usage

Support and Community

Community Resources:

Support Channels:

Ethical Considerations

Licensing

200+ AI Models

The Best Growth Choice for Enterprise

The Best Growth Choice
for Enterprise