1.95
22.75
Video
Active

Gemini Omni

Gemini Omni Flash Preview API enables flexible video creation from text, images, and existing video clips — with durations up to 10 seconds and conversational multi-turn editing.
Gemini OmniTechflow Logo - Techflow X Webflow Template

Gemini Omni

Gemini Omni Flash Preview is Google's multimodal video generation and editing model, supporting text-to-video, image-to-video, reference-to-video, and edit workflows.

What is Gemini Omni Flash Preview API?

Gemini Omni Flash Preview is Google's multimodal video generation model. It produces video from text prompts, reference images, and existing video clips — and supports four task modes: text-to-video, image-to-video, reference-to-video, and video editing.

Output videos are delivered in 16:9 or 9:16 aspect ratios at durations from 3 to 10 seconds. Multi-turn editing is available via the Gemini Interactions API, enabling conversational refinement of generated content.

[Model Specifications embed]

[Performance Benchmarks embed]

API Pricing

  • Input (any modality): $1.95 per 1M tokens
  • Output video: $22.75 per 1M tokens (≈$0.1318 per second of 720p video)
  • Output text: $11.70 per 1M tokens

Where to Use Gemini Omni

Text-to-video content creation
Describe a scene and receive a short video clip — for social content, marketing materials, or product demos where written briefs need to become motion assets.

Image-to-video animation
Animate a still image into a video sequence. Useful for product photography, character animation, and giving motion to static visuals.

Reference-guided generation
Bind reference images to specific roles in the prompt using tags like <IMAGE_REF_0> for precise control over character appearance, environment, or style.

Video editing workflows
Pass an existing video clip and prompt the model to modify it — background changes, motion adjustments, style transformation.

Gemini Omni vs. the Alternatives

  • Gemini Omni Flash Preview: Versatile multi-task video model. Best for workflows that need multiple generation modes in one API.
  • Seedance 2.0: ByteDance video model with up to 4K resolution. Choose Seedance for ultra-high-resolution output requirements.
  • Kling Video: Alternative video generation with distinct motion characteristics for different aesthetic needs.

What is Gemini Omni Flash Preview API?

Gemini Omni Flash Preview is Google's multimodal video generation model. It produces video from text prompts, reference images, and existing video clips — and supports four task modes: text-to-video, image-to-video, reference-to-video, and video editing.

Output videos are delivered in 16:9 or 9:16 aspect ratios at durations from 3 to 10 seconds. Multi-turn editing is available via the Gemini Interactions API, enabling conversational refinement of generated content.

[Model Specifications embed]

[Performance Benchmarks embed]

API Pricing

  • Input (any modality): $1.95 per 1M tokens
  • Output video: $22.75 per 1M tokens (≈$0.1318 per second of 720p video)
  • Output text: $11.70 per 1M tokens

Where to Use Gemini Omni

Text-to-video content creation
Describe a scene and receive a short video clip — for social content, marketing materials, or product demos where written briefs need to become motion assets.

Image-to-video animation
Animate a still image into a video sequence. Useful for product photography, character animation, and giving motion to static visuals.

Reference-guided generation
Bind reference images to specific roles in the prompt using tags like <IMAGE_REF_0> for precise control over character appearance, environment, or style.

Video editing workflows
Pass an existing video clip and prompt the model to modify it — background changes, motion adjustments, style transformation.

Gemini Omni vs. the Alternatives

  • Gemini Omni Flash Preview: Versatile multi-task video model. Best for workflows that need multiple generation modes in one API.
  • Seedance 2.0: ByteDance video model with up to 4K resolution. Choose Seedance for ultra-high-resolution output requirements.
  • Kling Video: Alternative video generation with distinct motion characteristics for different aesthetic needs.

Try it now

600+ AI Models

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

The Best Growth Choice
for Enterprise

Get API Key
Testimonials

Our Clients' Voices