Name: Gemini 3.5 Flash API
Brand: Google

Gemini 3.5 Flash

Gemini 3.5 Flash model narrows the gap between Flash-tier speed and Pro-level reasoning.

What Is Gemini 3.5 Flash API?

With Gemini 3.5 Flash API, Google introduces a new generation of multimodal AI designed for real-world production workloads, high-speed automation, advanced coding assistance, and scalable agentic workflows. Whether you're building AI-powered SaaS tools, enterprise copilots, customer support systems, automation pipelines, or next-generation coding agents, Gemini 3.5 Flash delivers the performance modern applications demand.

Unlike traditional heavyweight models that often sacrifice latency for reasoning quality, Gemini 3.5 Flash focuses on delivering advanced reasoning, coding capability, and multimodal understanding at Flash-level speed.

What Gemini 3.5 Flash brings to the table

Everything developers are expecting based on current leaks, official Gemini 3 series documentation, and early tester reports.

Near-Pro reasoning

Expected to come significantly closer to Gemini 3.1 Pro's reasoning quality while remaining faster and cheaper. Early testers report that the long-standing "lazy model" problem has been largely addressed.

1 million token context

Carries forward the full 1M token input context window from the Gemini 3 generation — large enough to fit entire codebases, lengthy documents, and complex multi-turn agentic conversations in a single call.

Configurable thinking levels

Like Gemini 3 Flash before it, 3.5 Flash is expected to support four thinking levels — minimal, low, medium, and high — letting developers trade off reasoning depth against latency and cost per request.

Full multimodal support

Accepts text, images, audio, video, and PDFs as input. Carries forward the improved audio grounding from the Gemini 3 series, including better handling of accented speech and background noise.

Stronger grounding

Leaked details point to improved search grounding reliability — a consistent pain point in earlier Gemini Flash models. If accurate, this could make 3.5 Flash significantly more useful for real-world research and factual tasks.

Gemini 3.5 Flash API Pricing

сached input: $0.065
audio input: $1.30

< 200K tokens

1M input tokens: $0.65
1M output tokens: $3.90

> 200K tokens

1M input tokens: $3.25
1M output tokens: $23.40

Model specs at a glance

Specification	Details
Model ID	gemini-3.5-flash-preview
Developer	Google DeepMind
Model series	Gemini 3.5 (Flash tier)
Context window	1,000,000 input tokens
Max output	~64,000–65,000 tokens (consistent with Gemini 3 Flash)
Knowledge cutoff	January 2026 (per leaked pricing page)
Input modalities	Text · Images · Audio · Video · PDFs
Output modalities	Text
Thinking levels	Minimal · Low · Medium · High
Tools supported	Google Search · URL Context · Code Execution · Function Calling · Maps Grounding
Context caching	Yes (reduces cost on repeated prompts)

Four thinking levels, one model

The thinking_level parameter lets you dial reasoning depth per request — critical for high-volume workloads where you don't want to pay Pro prices for simple tasks.

Thinking Level	Description	Best Use Cases
Minimal	Fastest, lowest cost. Similar latency to a non-reasoning model.	Classification, routing, and simple transformations where extended thought adds no value.
Low	Light reasoning with balanced speed and quality.	Summarization, translation, and basic Q&A for cost-efficient content pipelines.
Medium	Balanced performance and reasoning depth. The practical default for most apps.	Code generation, document analysis, and multi-step instructions.
High	Full reasoning chain with near-Pro output quality for advanced tasks.	Complex logic, math, research synthesis, and multi-hop reasoning.

Gemini 3.5 Flash vs. the current Gemini 3 lineup

Feature	Gemini 3.1 Flash-Lite	Gemini 3 Flash	Gemini 3.5 Flash	Gemini 3.1 Pro
Tier	Flash-Lite	Flash	Flash (premium)	Pro
Context window	1M tokens	1M tokens	1M tokens	2M tokens
Thinking levels	✓ 4 levels	✓ 4 levels	✓ 4 levels	✓ 4 levels
Reasoning quality	Good	Very good	Near-Pro	Best-in-class
Latency	Fastest	Fast	Fast	Slower
Grounding improvements	Moderate	Good	Improved	Best

What Gemini 3.5 Flash is built for

Based on everything we know so far, here's where 3.5 Flash will shine over the existing lineup.

Complex agentic workflows that need speed

Multi-step agent loops have historically been a weak point for Flash models — they'd drop context, hallucinate tool parameters, or give superficial answers mid-chain. 3.5 Flash is expected to handle these with the reliability previously reserved for Pro, at a fraction of the cost and latency.

Coding assistants and IDE integrations

Near-Pro reasoning means 3.5 Flash should handle real-world debugging, code review, and refactoring tasks that currently require reaching for Pro. The January 2026 knowledge cutoff also means better awareness of recent frameworks, package versions, and breaking API changes.

Real-time search-grounded applications

If the grounding improvements hold up at GA, 3.5 Flash should be significantly more reliable for applications where hallucination is costly — customer-facing research tools, fact-checking pipelines, and RAG systems that need accurate snippet ranking alongside low latency.

Interactive chat with complex documents

The 1M token context and expected reasoning improvements make it a strong fit for document Q&A products, legal contract analysis, and financial report parsing — tasks where Flash's speed matters but where previous Flash models sometimes gave shallow answers on dense material.

Multimodal pipelines (video, audio, images)

Gemini 3.5 Flash inherits the full multimodal stack, including improved audio input quality over the 2.5 generation. For applications ingesting meeting recordings, product photos, or mixed-media documents, it provides a single model rather than separate specialized endpoints.

‍

Example H2

Try it now

What Is Gemini 3.5 Flash API?

What Gemini 3.5 Flash brings to the table

Everything developers are expecting based on current leaks, official Gemini 3 series documentation, and early tester reports.

Near-Pro reasoning

1 million token context

Configurable thinking levels

Full multimodal support

Accepts text, images, audio, video, and PDFs as input. Carries forward the improved audio grounding from the Gemini 3 series, including better handling of accented speech and background noise.

Stronger grounding

Gemini 3.5 Flash API Pricing

сached input: $0.065
audio input: $1.30

< 200K tokens

1M input tokens: $0.65
1M output tokens: $3.90

> 200K tokens

1M input tokens: $3.25
1M output tokens: $23.40

Model specs at a glance

Specification	Details
Model ID	gemini-3.5-flash-preview
Developer	Google DeepMind
Model series	Gemini 3.5 (Flash tier)
Context window	1,000,000 input tokens
Max output	~64,000–65,000 tokens (consistent with Gemini 3 Flash)
Knowledge cutoff	January 2026 (per leaked pricing page)
Input modalities	Text · Images · Audio · Video · PDFs
Output modalities	Text
Thinking levels	Minimal · Low · Medium · High
Tools supported	Google Search · URL Context · Code Execution · Function Calling · Maps Grounding
Context caching	Yes (reduces cost on repeated prompts)

Four thinking levels, one model

The thinking_level parameter lets you dial reasoning depth per request — critical for high-volume workloads where you don't want to pay Pro prices for simple tasks.

Thinking Level	Description	Best Use Cases
Minimal	Fastest, lowest cost. Similar latency to a non-reasoning model.	Classification, routing, and simple transformations where extended thought adds no value.
Low	Light reasoning with balanced speed and quality.	Summarization, translation, and basic Q&A for cost-efficient content pipelines.
Medium	Balanced performance and reasoning depth. The practical default for most apps.	Code generation, document analysis, and multi-step instructions.
High	Full reasoning chain with near-Pro output quality for advanced tasks.	Complex logic, math, research synthesis, and multi-hop reasoning.

Gemini 3.5 Flash vs. the current Gemini 3 lineup

Feature	Gemini 3.1 Flash-Lite	Gemini 3 Flash	Gemini 3.5 Flash	Gemini 3.1 Pro
Tier	Flash-Lite	Flash	Flash (premium)	Pro
Context window	1M tokens	1M tokens	1M tokens	2M tokens
Thinking levels	✓ 4 levels	✓ 4 levels	✓ 4 levels	✓ 4 levels
Reasoning quality	Good	Very good	Near-Pro	Best-in-class
Latency	Fastest	Fast	Fast	Slower
Grounding improvements	Moderate	Good	Improved	Best

What Gemini 3.5 Flash is built for

Based on everything we know so far, here's where 3.5 Flash will shine over the existing lineup.

Gemini 3.5 Flash

Gemini 3.5 Flash

What Is Gemini 3.5 Flash API?

What Gemini 3.5 Flash brings to the table

Near-Pro reasoning

1 million token context

Configurable thinking levels

Full multimodal support

Stronger grounding

Gemini 3.5 Flash API Pricing

Model specs at a glance

Four thinking levels, one model

Gemini 3.5 Flash vs. the current Gemini 3 lineup

What Gemini 3.5 Flash is built for

Complex agentic workflows that need speed

Coding assistants and IDE integrations

Real-time search-grounded applications

Interactive chat with complex documents

Multimodal pipelines (video, audio, images)

What Is Gemini 3.5 Flash API?

What Gemini 3.5 Flash brings to the table

Near-Pro reasoning

1 million token context

Configurable thinking levels

Full multimodal support

Stronger grounding

Gemini 3.5 Flash API Pricing

Model specs at a glance

Four thinking levels, one model

Gemini 3.5 Flash vs. the current Gemini 3 lineup

What Gemini 3.5 Flash is built for

Complex agentic workflows that need speed

Coding assistants and IDE integrations

Real-time search-grounded applications

Interactive chat with complex documents

Multimodal pipelines (video, audio, images)

400+ AI Models

The Best Growth Choice
for Enterprise

Our Clients' Voices

Gemini 3.5 Flash

Gemini 3.5 Flash

What Is Gemini 3.5 Flash API?

What Gemini 3.5 Flash brings to the table

Near-Pro reasoning

1 million token context

Configurable thinking levels

Full multimodal support

Stronger grounding

Gemini 3.5 Flash API Pricing

Model specs at a glance

Four thinking levels, one model

Gemini 3.5 Flash vs. the current Gemini 3 lineup

What Gemini 3.5 Flash is built for

Complex agentic workflows that need speed

Coding assistants and IDE integrations

Real-time search-grounded applications

Interactive chat with complex documents

Multimodal pipelines (video, audio, images)

What Is Gemini 3.5 Flash API?

What Gemini 3.5 Flash brings to the table

Near-Pro reasoning

1 million token context

Configurable thinking levels

Full multimodal support

Stronger grounding

Gemini 3.5 Flash API Pricing

Model specs at a glance

Four thinking levels, one model

Gemini 3.5 Flash vs. the current Gemini 3 lineup

What Gemini 3.5 Flash is built for

Complex agentic workflows that need speed

Coding assistants and IDE integrations

Real-time search-grounded applications

Interactive chat with complex documents

Multimodal pipelines (video, audio, images)

400+ AI Models

The Best Growth Choice for Enterprise

Our Clients' Voices

The Best Growth Choice
for Enterprise