Nova-2: Advanced, versatile ASR model for diverse transcription needs.
Model Name: Nova-2
Developer/Creator: Deepgram
Versions:
nova-2 or nova-2-general: General model applicable across various domains and scenarios.
nova-2-meeting: Optimized for transcribing meetings.
nova-2-phonecall: Designed specifically for transcribing phone calls.
nova-2-finance: Tailored for finance-related contexts.
nova-2-conversationalai: Ideal for conversational AI applications.
nova-2-voicemail: Suited for transcribing voicemail messages.
nova-2-video: Optimized for video content transcription.
nova-2-medical: Customized for medical transcription needs.
nova-2-drivethru: Developed for use in drive-thru communication systems.
nova-2-automotive: Designed for automotive environments.
Model Type: Automatic Speech Recognition (ASR)
Nova-2 is touted as the world's most powerful speech-to-text model, specifically designed for both pre-recorded and streaming audio in English. Nova-2 is 18% more accurate than our previous Nova model and offers a 36% relative WER improvement over OpenAI Whisper (large).
Intended for a wide range of voice applications including real-time transcription services, media transcription, and automated services requiring speech-to-text functionality.
This speech-to-text model also excels in medical settings with 16% better medical term accuracy, handling 120-180 words/minute. Learn more about this and other models and their applications in Healthcare here.
Nova-2 utilizes a cutting-edge Transformer-based architecture that delivers substantial enhancements compared to its predecessor. This has led to a notable decrease in word error rate (WER) by 18.4% from Nova-1. Moreover, the architectural improvements in Nova-2 have greatly increased accuracy in transcribing entities (such as proper nouns and alphanumerics), punctuation, and capitalization for both pre-recorded and live streaming content.
Trained on Deepgram's largest and most varied dataset to date, Nova-2 was developed using nearly 6 million resources and 47 billion tokens, enriched with a comprehensive collection of high-quality human transcriptions.
Significant improvements in word error rate (WER) compared to previous models and competitors, detailed benchmarking results available.
Nova-2 outperformed all competitors, registering a median inference time of just 29.8 seconds per hour of diarized audio. This marked a substantial speed advantage, being 5 to 40 times faster than other vendors with diarization capabilities.
Code Samples/SDK:
Tutorials: Speech-to-text Multimodal Experience in NodeJS
File Size
The maximum file size is limited to 2 GB.
Rate Limits
The rate limit is 100 concurrent requests.