Voice

Deepgram Nova-2

Deepgram Nova-2 API features enhanced accuracy, multilingual support, and rapid transcription across various applications.
Try it now

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 200 models to integrate into your app.
AI Playground image
Ai models list in playground
Testimonials

Our Clients' Voices

Deepgram Nova-2Techflow Logo - Techflow X Webflow Template

Deepgram Nova-2

Nova-2: Advanced, versatile ASR model for diverse transcription needs.

Model Overview Card for Deepgram Nova-2

Basic Information

Model Name: Nova-2

Developer/Creator: Deepgram

Versions: 

nova-2 or nova-2-general: General model applicable across various domains and scenarios.

nova-2-meeting: Optimized for transcribing meetings.

nova-2-phonecall: Designed specifically for transcribing phone calls.

nova-2-finance: Tailored for finance-related contexts.

nova-2-conversationalai: Ideal for conversational AI applications.

nova-2-voicemail: Suited for transcribing voicemail messages.

nova-2-video: Optimized for video content transcription.

nova-2-medical: Customized for medical transcription needs.

nova-2-drivethru: Developed for use in drive-thru communication systems.

nova-2-automotive: Designed for automotive environments.

Model Type: Automatic Speech Recognition (ASR)

Description

Overview: 

Nova-2 is touted as the world's most powerful speech-to-text model, specifically designed for both pre-recorded and streaming audio in English. Nova-2 is 18% more accurate than our previous Nova model and offers a 36% relative WER improvement over OpenAI Whisper (large).

Key Features:
  • Multilingual capabilities.
  • High accuracy and reduced word error rate.
  • Fast inference times.
  • Competitive pricing.
Intended Use:

Intended for a wide range of voice applications including real-time transcription services, media transcription, and automated services requiring speech-to-text functionality.

This speech-to-text model also excels in medical settings with 16% better medical term accuracy, handling 120-180 words/minute. Learn more about this and other models and their applications in Healthcare here.

Technical Details

Architecture: 

Nova-2 utilizes a cutting-edge Transformer-based architecture that delivers substantial enhancements compared to its predecessor. This has led to a notable decrease in word error rate (WER) by 18.4% from Nova-1. Moreover, the architectural improvements in Nova-2 have greatly increased accuracy in transcribing entities (such as proper nouns and alphanumerics), punctuation, and capitalization for both pre-recorded and live streaming content.

Training Data: 

Trained on Deepgram's largest and most varied dataset to date, Nova-2 was developed using nearly 6 million resources and 47 billion tokens, enriched with a comprehensive collection of high-quality human transcriptions.

Performance Metrics: 

Significant improvements in word error rate (WER) compared to previous models and competitors, detailed benchmarking results available.

Speed is crucial for many applications:

Nova-2 outperformed all competitors, registering a median inference time of just 29.8 seconds per hour of diarized audio. This marked a substantial speed advantage, being 5 to 40 times faster than other vendors with diarization capabilities.

Usage

Code Samples/SDK:

Tutorials: Speech-to-text Multimodal Experience in NodeJS

File Size

The maximum file size is limited to 2 GB.


Rate Limits

The rate limit is 100 concurrent requests.

Ethical Considerations

  • Ethical Guidelines: Adherence to ethical AI development practices, with a focus on reducing bias and ensuring privacy.
  • Bias Mitigation: Continuous efforts to improve the model's fairness and accuracy across diverse speech patterns and accents.
Try it now

The Best Growth Choice
for Enterprise

Get API Key