upd

June 12, 2026

min

The most accurate speech models available — pick the right one

Deepgram’s Nova-3, Nova-3 General, and Nova-3 Medical serve different STT needs. This guide explains their strengths and ideal use cases.

Nova-3 at a glance

All three models belong to the same generation of Deepgram's neural architecture, but they're tuned for fundamentally different jobs. Here's the shortest version of the comparison; the detailed breakdown follows below.

Model	String	Primary purpose	Language scope	Best for
Nova-3 General	`nova-3` or `nova-3-general`	Broad-purpose ASR — batch and streaming	50+ languages incl. multilingual	Meetings, events, multi-speaker, far-field, multilingual audio
Nova-3 Medical Specialist	`nova-3-medical`	Healthcare-focused speech recognition	English (8 regional variants)	Clinical dictation, medical records, healthcare apps
Nova-2 Medical Previous gen	`nova-2-medical`	Prior medical model (Nova-2 generation)	English (en-US)	Fallback for legacy workflows; filler word detection

Nova-3: Deepgram's general-purpose flagship

When people say "Nova-3," they usually mean the foundation model behind the entire family. It's the most capable ASR engine Deepgram has shipped for general-purpose use, and it's built to handle the messy, complex audio scenarios that older models struggled with.

`Foundation Model`

‍Nova-3

Nova-3 is the result of substantial retraining on a more diverse and representative dataset than its predecessor, Nova-2. The most visible result is a 54.2% reduction in word error rate for streaming audio and 47.4% for batch processing when compared against major competitors, numbers that translate directly into fewer missed words and cleaner transcripts in production.

Beyond raw accuracy, Nova-3 introduces several capabilities that matter for real applications: it can handle conversations where speakers freely switch between languages mid-sentence (multilingual codeswitching), it understands domain-specific vocabulary without requiring custom model training, and it supports optional on-the-fly redaction of personal information like phone numbers and names.

The self-serve customization feature is worth highlighting separately, Nova-3 is the first Deepgram model that allows vocabulary adaptation (keyterm prompting) without submitting a model retraining request. If your application regularly hears uncommon product names, internal jargon, or specialized terminology, you can provide keyterms at request time and the model will give them elevated recognition priority immediately.

Industry-leading word error rate
Real-time multilingual codeswitching
50+ supported languages
Enhanced domain-specific terminology
Keyterm prompting (no retraining)
PII redaction support
Speaker diarization
Batch and streaming support
Smart formatting for numbers/dates
Punctuation and paragraph detection

Recommended use cases

Meetings & conference calls

Multi-speaker audio with accurate speaker diarization and noisy conditions handled well.

Live event captioning

Real-time accuracy for broadcasts, webinars, and live presentations.

Multilingual applications

Products where users switch between languages naturally in the same conversation.

Far-field & noisy audio

Smart home, in-car, and ambient environments with unpredictable sound quality.

Nova-3 Medical: purpose-built for clinical environments

Healthcare speech recognition is its own discipline. The vocabulary is enormous, pronunciation is inconsistent across specialties and accents, and the cost of a transcription error is categorically higher than in a general context. Nova-3 Medical was built specifically for this problem.

`Medical Specialist`

‍Nova-3 Medical

Nova-3 Medical carries all the architectural improvements of Nova-3 but is additionally fine-tuned on a deep corpus of medical speech — clinical dictation recordings, physician notes, discharge summaries, and the full range of specialty-specific language that a general model would frequently miss or mangle.

The practical difference shows up in terminology like drug names, anatomical structures, diagnostic codes, procedural terms, and the specific phrasing patterns that clinicians use when dictating under time pressure. A model that hasn't been trained on this domain will struggle with words like "methicillin-resistant Staphylococcus aureus," "ventriculoperitoneal shunt," or specialty abbreviations that don't appear in everyday speech. Nova-3 Medical handles these reliably.

It currently supports English across eight regional variants, which covers the major clinical markets where English is the primary working language of documentation.

Medical vocabulary fine-tuning
Pharmacological term accuracy
Anatomical & procedural terminology
Clinical dictation patterns
8 English regional variants
Specialty-specific language
High-accuracy medical abbreviations

Nova-3 Medical provides transcription assistance and should always be used with appropriate clinical oversight. It is not a clinical decision support tool and does not replace qualified medical documentation review. Healthcare organizations should ensure regulatory compliance, including HIPAA requirements, in their integration. Always validate transcriptions before they enter official medical records.

Where it fits in healthcare workflows

Physician dictation

SOAP notes, discharge summaries, referral letters, and operative reports dictated at natural speech pace.

EHR integration

Real-time transcription piped directly into electronic health record fields, reducing manual entry time.

Pharmacy documentation

Drug names, dosages, routes, and clinical instructions accurately transcribed without generic misrecognition.

Medical research

Interview transcription for qualitative research, focus groups, and clinical study documentation.

Nova-3 General: when versatility is the requirement

The nova-3-general model string (identical to nova-3) is Deepgram's recommended default for any application that doesn't have a specific domain requirement. It's the broadest, most capable model for general-purpose transcription work.

`General Purpose`

‍nova-3-general

If your use case doesn't align with a particular industry or niche, Nova-3 General provides a versatile solution. It performs strongly on customer service calls, podcast transcription, accessibility captioning, content localization, and any application where the audio environment and vocabulary are unpredictable.

The model's multilingual capability is a genuine differentiator here. In independent testing, Nova-3 showed up to 8:1 user preference ratios over competing models for certain language pairs — not just English. For teams building global applications, this means you can rely on a single model instead of managing separate pipelines for different languages.

Industries where it performs well

Contact centers

Agent call transcription, quality assurance, post-call summarization, and compliance monitoring.

Media & content

Podcast transcripts, video subtitles, interview archives, and content accessibility workflows.

Enterprise productivity

Meeting notes, internal knowledge capture, voice search, and document generation from spoken content.

EdTech & eLearning

Lecture transcription, accessibility compliance, language learning feedback, and interactive voice exercises.

Performance characteristics

Nova-3 represents a major step forward over Nova-2 and over competing general-purpose speech models. Here's a simplified view of where the models stand relative to each other on key dimensions.

Category	Model	Performance
General transcription accuracy	Nova-3 (streaming)	~54% less WER than competitors
	Nova-3 (batch)	~47% less WER than competitors
	Nova-2 (baseline)	Strong previous-gen baseline
Medical domain accuracy	Nova-3 Medical	Highest medical accuracy
	Nova-3 General	Good, not specialist
	Nova-2 Medical	Previous generation

Which Nova-3 model should you use?

The right model depends on your domain, language requirements, and what kind of audio you're processing. Here's a practical decision framework.

Use `nova-3` or `nova-3-general` when:

Your users speak multiple languages or code-switch
Audio comes from noisy or far-field environments
You need the broadest language coverage
Your use case spans multiple industries
You want the flexibility of keyterm prompting
You're building multilingual consumer or enterprise apps

Use `nova-3-medical` when:

You're building for clinical or healthcare workflows
Your audio contains drug names, anatomical terms, or procedures
Accuracy on specialist vocabulary is critical
You're integrating with EHR or clinical documentation systems
Your users are primarily English-speaking clinicians
Regulatory and compliance context demands domain accuracy

Heads up on Nova-2 Medical: If your current integration uses nova-2-medical, consider testing nova-3-medical — the generation upgrade brings meaningfully better accuracy for most medical audio. Nova-2 remains available for backward compatibility and for use cases requiring filler word detection, which Nova-3 doesn't support yet.

Start building with Nova-3 today

Access Deepgram's Nova-3, Nova-3 General, and Nova-3 Medical through AI/ML API, alongside hundreds of other models, in one place.

Frequently asked questions

What's the difference between nova-3 and nova-3-general?

They're the same model. Deepgram uses both strings as aliases that resolve to identical underlying model weights. You can use either in your API calls and get exactly the same results. The nova-3-general string makes the intent more explicit when working with a team or reading code later.

Does Nova-3 Medical work for non-English languages?

Not currently. Nova-3 Medical supports English only, across eight regional variants (en, en-US, en-AU, en-CA, en-GB, en-IE, en-IN, en-NZ). If you need medical transcription in other languages, you would need to use nova-3-general with appropriate keyterm prompting, though the specialist tuning won't be present.

What is keyterm prompting and does it work with all Nova-3 models?

Keyterm prompting lets you provide a list of words or phrases that the model should treat as high-priority vocabulary. This is useful for proper nouns, brand names, internal terminology, or rare words. You pass these at request time using the keyterm parameter. It works with Nova-3 / Nova-3 General and Nova-3 Medical alike — no model retraining required.

How does multilingual codeswitching work in Nova-3?

When you set language=multi, Nova-3 actively detects which language is being spoken and transcribes it accordingly — even within a single utterance. The supported languages for multi mode are English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch. This is different from simple language detection: the model handles transitions in real time rather than labeling the dominant language of a whole segment.

Example H2

Share with friends

Ready to get started? Get Your API Key Now!

Get API Key

The most accurate speech models available — pick the right one

Nova-3 at a glance