Name: Deepgram Nova-2 API
Brand: Deepgram

Deepgram Nova-2

Nova-2: Advanced, versatile ASR model for diverse transcription needs.

Deepgram Nova-2 Description

Basic Information

Model Name: Nova-2

Developer/Creator: Deepgram

Versions:

nova-2 or nova-2-general: General model applicable across various domains and scenarios.

nova-2-meeting: Optimized for transcribing meetings.

nova-2-phonecall: Designed specifically for transcribing phone calls.

nova-2-finance: Tailored for finance-related contexts.

nova-2-conversationalai: Ideal for conversational AI applications.

nova-2-voicemail: Suited for transcribing voicemail messages.

nova-2-video: Optimized for video content transcription.

nova-2-medical: Customized for medical transcription needs.

nova-2-drivethru: Developed for use in drive-thru communication systems.

nova-2-automotive: Designed for automotive environments.

Model Type: Automatic Speech Recognition (ASR)

Overview:

Nova-2 is touted as the world's most powerful speech-to-text model, specifically designed for both pre-recorded and streaming audio in English. Nova-2 is 18% more accurate than our previous Nova model and offers a 36% relative WER improvement over OpenAI Whisper (large).

Key Features:

Multilingual capabilities.
High accuracy and reduced word error rate.
Fast inference times.
Competitive pricing.

Intended Use:

Intended for a wide range of voice applications including real-time transcription services, media transcription, and automated services requiring speech-to-text functionality.

This speech-to-text model also excels in medical settings with 16% better medical term accuracy, handling 120-180 words/minute. Learn more about this and other models and their applications in Healthcare here.

Technical Details

Architecture:

Nova-2 utilizes a cutting-edge Transformer-based architecture that delivers substantial enhancements compared to its predecessor. This has led to a notable decrease in word error rate (WER) by 18.4% from Nova-1. Moreover, the architectural improvements in Nova-2 have greatly increased accuracy in transcribing entities (such as proper nouns and alphanumerics), punctuation, and capitalization for both pre-recorded and live streaming content.

Training Data:

Trained on Deepgram's largest and most varied dataset to date, Nova-2 was developed using nearly 6 million resources and 47 billion tokens, enriched with a comprehensive collection of high-quality human transcriptions.

Performance Metrics:

Significant improvements in word error rate (WER) compared to previous models and competitors, detailed benchmarking results available.

Speed is crucial for many applications:

Nova-2 outperformed all competitors, registering a median inference time of just 29.8 seconds per hour of diarized audio. This marked a substantial speed advantage, being 5 to 40 times faster than other vendors with diarization capabilities.

Usage

Code Samples/SDK:

Tutorials: Speech-to-text Multimodal Experience in NodeJS

File Size

The maximum file size is limited to 2 GB.

Rate Limits

The rate limit is 100 concurrent requests.

Ethical Considerations

Ethical Guidelines: Adherence to ethical AI development practices, with a focus on reducing bias and ensuring privacy.
Bias Mitigation: Continuous efforts to improve the model's fairness and accuracy across diverse speech patterns and accents.

Example H2

Try it now

Deepgram Nova-2 Description

Basic Information

Model Name: Nova-2

Developer/Creator: Deepgram

Versions:

nova-2 or nova-2-general: General model applicable across various domains and scenarios.

nova-2-meeting: Optimized for transcribing meetings.

nova-2-phonecall: Designed specifically for transcribing phone calls.

nova-2-finance: Tailored for finance-related contexts.

nova-2-conversationalai: Ideal for conversational AI applications.

nova-2-voicemail: Suited for transcribing voicemail messages.

nova-2-video: Optimized for video content transcription.

nova-2-medical: Customized for medical transcription needs.

nova-2-drivethru: Developed for use in drive-thru communication systems.

nova-2-automotive: Designed for automotive environments.

Model Type: Automatic Speech Recognition (ASR)

Overview:

Key Features:

Multilingual capabilities.
High accuracy and reduced word error rate.
Fast inference times.
Competitive pricing.

Intended Use:

Intended for a wide range of voice applications including real-time transcription services, media transcription, and automated services requiring speech-to-text functionality.

Technical Details

Architecture:

Training Data:

Performance Metrics:

Significant improvements in word error rate (WER) compared to previous models and competitors, detailed benchmarking results available.

Speed is crucial for many applications:

Usage

Code Samples/SDK:

Tutorials: Speech-to-text Multimodal Experience in NodeJS

File Size

The maximum file size is limited to 2 GB.

Rate Limits

The rate limit is 100 concurrent requests.

Ethical Considerations

Ethical Guidelines: Adherence to ethical AI development practices, with a focus on reducing bias and ensuring privacy.
Bias Mitigation: Continuous efforts to improve the model's fairness and accuracy across diverse speech patterns and accents.

Try it now

Deepgram Nova-2

Deepgram Nova-2

Deepgram Nova-2 Description

Basic Information

Overview:

Key Features:

Intended Use:

Technical Details

Architecture:

Training Data:

Performance Metrics:

Speed is crucial for many applications:

Usage

Ethical Considerations

Deepgram Nova-2 Description

Basic Information

Overview:

Key Features:

Intended Use:

Technical Details

Architecture:

Training Data:

Performance Metrics:

Speed is crucial for many applications:

Usage

Ethical Considerations

600+ AI Models

The Best Growth Choice
for Enterprise

Our Clients' Voices

Deepgram Nova-2

Deepgram Nova-2

Deepgram Nova-2 Description

Basic Information

Overview:

Key Features:

Intended Use:

Technical Details

Architecture:

Training Data:

Performance Metrics:

Speed is crucial for many applications:

Usage

Ethical Considerations

Deepgram Nova-2 Description

Basic Information

Overview:

Key Features:

Intended Use:

Technical Details

Architecture:

Training Data:

Performance Metrics:

Speed is crucial for many applications:

Usage

Ethical Considerations

600+ AI Models

The Best Growth Choice for Enterprise

Our Clients' Voices

The Best Growth Choice
for Enterprise