Voice
Active

Nova-3

Supports streaming transcription, speaker diarization, smart formatting, and intent/topic detection for monolingual English workflows.
Nova-3Techflow Logo - Techflow X Webflow Template

Nova-3

Nova-3 — high-accuracy speech-to-text model by Deepgram for real-time and batch transcription with low latency and advanced audio intelligence.

Deepgram Nova-3 is an advanced speech-to-text model built on Deepgram's end-to-end deep learning architecture. It delivers industry-leading word error rates for English, real-time streaming transcription via WebSocket, and a rich set of audio intelligence features — making it suitable for contact centers, meeting transcription, voice analytics, and developer applications.

Technical Specifications

Performance Benchmarks
- Industry-leading word error rate (WER) for English.
- Sub-300ms latency in streaming mode.
- Handles noisy environments, accents, and spontaneous speech.
- Supports audio files up to 2GB or 5 hours in batch mode.
- Per-word confidence scores available in output.

Architecture Breakdown
Nova-3 uses Deepgram's proprietary end-to-end deep learning pipeline that processes raw audio directly, eliminating multi-step feature extraction. This reduces error accumulation and enables faster, more accurate inference across diverse recording conditions.

Pricing
- $0.01001 / min

Core Features & Capabilities
- Streaming Transcription: Real-time interim and final results via WebSocket.
- Speaker Diarization: Automatically labels each speaker starting at index 0.
- Smart Formatting: Locale-aware formatting for numbers, dates, and punctuation.
- Intent & Topic Detection: Detects custom or model-identified intents and topics.
- Entity Detection: Extracts key entities from audio content.
- Custom Vocabulary (keyterm): Boosts recognition accuracy for domain-specific terms.
- Filler Word Detection: Identifies and optionally removes filler words.
- Utterance Segmentation: Splits transcript into labeled speech segments.
- Multichannel Support: Transcribes each audio channel independently.

Comparison with Other Models
VS Deepgram Nova-3 General: Nova-3 is optimized for English-only workflows with maximum accuracy, while Nova-3 General adds multilingual support at the same price point.
VS Deepgram Nova-3 Medical: Nova-3 Medical is fine-tuned for clinical terminology and healthcare audio; Nova-3 is the general-purpose English model at a higher per-minute rate.

Deepgram Nova-3 is an advanced speech-to-text model built on Deepgram's end-to-end deep learning architecture. It delivers industry-leading word error rates for English, real-time streaming transcription via WebSocket, and a rich set of audio intelligence features — making it suitable for contact centers, meeting transcription, voice analytics, and developer applications.

Technical Specifications

Performance Benchmarks
- Industry-leading word error rate (WER) for English.
- Sub-300ms latency in streaming mode.
- Handles noisy environments, accents, and spontaneous speech.
- Supports audio files up to 2GB or 5 hours in batch mode.
- Per-word confidence scores available in output.

Architecture Breakdown
Nova-3 uses Deepgram's proprietary end-to-end deep learning pipeline that processes raw audio directly, eliminating multi-step feature extraction. This reduces error accumulation and enables faster, more accurate inference across diverse recording conditions.

Pricing
- $0.01001 / min

Core Features & Capabilities
- Streaming Transcription: Real-time interim and final results via WebSocket.
- Speaker Diarization: Automatically labels each speaker starting at index 0.
- Smart Formatting: Locale-aware formatting for numbers, dates, and punctuation.
- Intent & Topic Detection: Detects custom or model-identified intents and topics.
- Entity Detection: Extracts key entities from audio content.
- Custom Vocabulary (keyterm): Boosts recognition accuracy for domain-specific terms.
- Filler Word Detection: Identifies and optionally removes filler words.
- Utterance Segmentation: Splits transcript into labeled speech segments.
- Multichannel Support: Transcribes each audio channel independently.

Comparison with Other Models
VS Deepgram Nova-3 General: Nova-3 is optimized for English-only workflows with maximum accuracy, while Nova-3 General adds multilingual support at the same price point.
VS Deepgram Nova-3 Medical: Nova-3 Medical is fine-tuned for clinical terminology and healthcare audio; Nova-3 is the general-purpose English model at a higher per-minute rate.

Try it now

500+ AI Models

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The Best Growth Choice
for Enterprise

Get API Key
Testimonials

Our Clients' Voices