Name: Gemini Omni API
Brand: Google

Gemini Omni

Gemini Omni Flash Preview is Google's multimodal video generation and editing model, supporting text-to-video, image-to-video, reference-to-video, and edit workflows.

What is Gemini Omni Flash Preview API?

Gemini Omni Flash Preview is Google's multimodal video generation model. It produces video from text prompts, reference images, and existing video clips — and supports four task modes: text-to-video, image-to-video, reference-to-video, and video editing.

Output videos are delivered in 16:9 or 9:16 aspect ratios at durations from 3 to 10 seconds. Multi-turn editing is available via the Gemini Interactions API, enabling conversational refinement of generated content.

[Model Specifications embed]

[Performance Benchmarks embed]

API Pricing

Input (any modality): $1.95 per 1M tokens
Output video: $22.75 per 1M tokens (≈$0.1318 per second of 720p video)
Output text: $11.70 per 1M tokens

Where to Use Gemini Omni

Text-to-video content creation
Describe a scene and receive a short video clip — for social content, marketing materials, or product demos where written briefs need to become motion assets.

Image-to-video animation
Animate a still image into a video sequence. Useful for product photography, character animation, and giving motion to static visuals.

Reference-guided generation
Bind reference images to specific roles in the prompt using tags like <IMAGE_REF_0> for precise control over character appearance, environment, or style.

Video editing workflows
Pass an existing video clip and prompt the model to modify it — background changes, motion adjustments, style transformation.

Gemini Omni vs. the Alternatives

Gemini Omni Flash Preview: Versatile multi-task video model. Best for workflows that need multiple generation modes in one API.
Seedance 2.0: ByteDance video model with up to 4K resolution. Choose Seedance for ultra-high-resolution output requirements.
Kling Video: Alternative video generation with distinct motion characteristics for different aesthetic needs.

‍

Example H2

Try it now