Gemini 2.5 Flash Preview

Fast, cost-effective multimodal AI with 1M-token context, excelling in coding and reasoning.

Gemini 2.5 Flash Description

Gemini 2.5 Flash, developed by Google DeepMind, is a fast, cost-effective multimodal AI model designed for reasoning and coding tasks. With a 1-million-token context window, it excels in web development, math, and scientific analysis. Available via Google AI Studio and Vertex AI (preview), it balances quality, cost, and speed for developers and enterprises.

Technical Specifications

Performance Benchmarks

Gemini 2.5 Flash is a hybrid reasoning model with a Transformer-based architecture, allowing developers to adjust "thinking" depth for optimized performance. It supports text, image, video, and audio inputs, with enhanced post-training for superior reasoning.

Context Window: 1 million tokens, expanding to 2 million soon.
Output Capacity: Up to 32,768 tokens per response.
Benchmarks:
- AIME 2025 (math): 78.3% (with thinking), 72.1% (without thinking)
- GPQA Diamond (science): 76.5% (with thinking)
- SWE-Bench Verified (coding): 58.2% (with thinking)
- MMLU: 0.783 (with thinking)
Performance: 180 tokens/second output speed, 0.8s latency (TTFT without thinking).
API Pricing:
- Input tokens: $0.1575 per million tokens
- Output tokens: $0.63 per million tokens
- Cost for 1,000 tokens: $0.0001575 (input) + $0.00063 (output with thinking) = $0.0007875 total

Performance Metrics

2.5 Flash metrics compared to other leading models

Key Capabilities

Gemini 2.5 Pro Experimental is a reasoning-focused model, methodically analyzing tasks for accurate, nuanced outputs. Its multimodal capabilities enable seamless integration of text, images, video, and audio, making it versatile for complex workflows.

Advanced Coding: Tops WebDev Arena, generating functional web apps with aesthetic UI (e.g., video players, dictation apps). Supports 40+ languages and agentic coding with minimal supervision.
Reasoning and Problem-Solving: Excels in math (AIME 2025: 86.7%) and science (GPQA: 84%), with built-in thinking for logical conclusions.
Multimodal Processing: Scores 84.8% on VideoMME, enabling video-to-code workflows (e.g., learning apps from YouTube videos).
Tool Utilization: Supports function calling, JSON structuring, and external tool integration for multi-step tasks and API interactions.
Web Development: Generates responsive, visually appealing web apps with features like wavelength animations and hover effects.
Interactive Simulations: Creates executable code for games (e.g., endless runner) and visualizations (e.g., Mandelbrot fractals, boid animations).
API Features: Offers streaming, function calling, and multilingual support for real-time, scalable applications.

Optimal Use Cases

Web Development: Creating interactive apps with responsive design.
Code Generation: Autonomous coding for simulations and large codebases.
Scientific Research: Data analysis in math and science.
Multimodal Applications: Learning apps from videos and visualizations.
Business Automation: Streamlining tasks with API integration.

Comparison with Other Models

vs. OpenAI o3-mini: Faster (180 vs. ~100 tokens/second) and cheaper without thinking ($0.15 vs. $0.30 per million output tokens).
vs. Claude 3.7 Sonnet: Lower SWE-Bench score (58.2% vs. ~65%), but faster and cheaper.
vs. DeepSeek R1: Lower AIME score (78.3% vs. 93.3%), but better in multimodality.
vs. Qwen3-235B-A22B: Higher output speed (180 vs. 40.1 tokens/second) and lower cost.

Code Samples

Limitations

High latency (0.8s TTFT) with thinking for real-time use.
Experimental status may affect stability.
No fine-tuning support.
Thinking mode increases costs.

API Integration

Accessible via AI/ML API with streaming, function calling, and multimodal support. Documentation: available here.

Try it now

The Best Growth Choice
for Enterprise

Get API Key

Gemini 2.5 Flash Preview

AI Playground

Our Clients' Voices

Gemini 2.5 Flash Preview

Gemini 2.5 Flash Description