Nova-2: Advanced, versatile ASR model for diverse transcription needs.

Deepgram Nova-2 API features enhanced accuracy, multilingual support, and rapid transcription across various applications.

Model Overview Card for Deepgram Nova-2

Basic Information

Model Name: Nova-2

Developer/Creator: Deepgram


nova-2 or nova-2-general: General model applicable across various domains and scenarios.

nova-2-meeting: Optimized for transcribing meetings.

nova-2-phonecall: Designed specifically for transcribing phone calls.

nova-2-finance: Tailored for finance-related contexts.

nova-2-conversationalai: Ideal for conversational AI applications.

nova-2-voicemail: Suited for transcribing voicemail messages.

nova-2-video: Optimized for video content transcription.

nova-2-medical: Customized for medical transcription needs.

nova-2-drivethru: Developed for use in drive-thru communication systems.

nova-2-automotive: Designed for automotive environments.

Model Type: Automatic Speech Recognition (ASR)



Nova-2 is touted as the world's most powerful speech-to-text model, specifically designed for both pre-recorded and streaming audio in English. Nova-2 is 18% more accurate than our previous Nova model and offers a 36% relative WER improvement over OpenAI Whisper (large).

Key Features:
  • Multilingual capabilities.
  • High accuracy and reduced word error rate.
  • Fast inference times.
  • Competitive pricing.
Intended Use:

Intended for a wide range of voice applications including real-time transcription services, media transcription, and automated services requiring speech-to-text functionality.

Technical Details


Nova-2 utilizes a cutting-edge Transformer-based architecture that delivers substantial enhancements compared to its predecessor. This has led to a notable decrease in word error rate (WER) by 18.4% from Nova-1. Moreover, the architectural improvements in Nova-2 have greatly increased accuracy in transcribing entities (such as proper nouns and alphanumerics), punctuation, and capitalization for both pre-recorded and live streaming content.

Training Data: 

Trained on Deepgram's largest and most varied dataset to date, Nova-2 was developed using nearly 6 million resources and 47 billion tokens, enriched with a comprehensive collection of high-quality human transcriptions.

Performance Metrics: 

Significant improvements in word error rate (WER) compared to previous models and competitors, detailed benchmarking results available.

Speed is crucial for many applications:

Nova-2 outperformed all competitors, registering a median inference time of just 29.8 seconds per hour of diarized audio. This marked a substantial speed advantage, being 5 to 40 times faster than other vendors with diarization capabilities.


Code Samples/SDK:

Tutorials: Speech-to-text Multimodal Experience in NodeJS

File Size

The maximum file size is limited to 2 GB.

Rate Limits

The rate limit is 100 concurrent requests.

Ethical Considerations

  • Ethical Guidelines: Adherence to ethical AI development practices, with a focus on reducing bias and ensuring privacy.
  • Bias Mitigation: Continuous efforts to improve the model's fairness and accuracy across diverse speech patterns and accents.
