Google Gemini Omni Flash Turns Any Input Into Video

Gemini Omni Flash is live. Google announced the model at I/O 2026, a multimodal AI that takes any input type and builds a video from it: text, images, audio, or existing footage. The company positions it as the video equivalent of NanoBanana, its image model that has produced over 50 billion images since launching last year.

The model is available now in the Gemini app, Google Flow, and YouTube Shorts. Anyone on those platforms can start experimenting today.

READ: ChatGPT Now Sees Your Bank Account. Should You Let It?

What Gemini Omni Flash Actually Does

Omni Flash generates clips up to 10 seconds long with synchronised audio. That limit is temporary. Google says it is actively working to extend the duration.

Prompts layer on top of each other in conversation; you iterate, and you don’t restart. Visual consistency holds across sequential edits, which is where most generative video tools break down. Google DeepMind CTO Koray Kavukcuoglu says Omni Flash has meaningfully more world knowledge than Veo because of Gemini’s training data, though he did not specify metrics.

That distinction matters. Veo only takes text prompts. Omni Flash takes video as input and uses it as a foundation for new content. Hand it an existing clip, and it can transform entirely: swap visual style, change on-screen physics, and sync lighting to audio. The model also applies knowledge of biology, history, physics, and narrative logic to keep generated clips coherent.

Google has posted demos on its website. One transforms a plain clip of a man at a mirror: the model adds liquid physics to the glass and then rebuilds the whole scene as voxel art from the same source footage, resulting in three entirely different outputs. Another syncs apartment window lights to a techno track. A third produces a claymation-style explainer on protein folding, which is either impressive or unsettling, depending on your threshold.

READ: OpenAI Is Exploring a Lawsuit Against Apple Over Its iPhone AI Deal

The Deepfake Question Google Needs to Answer

Nicole Brichtova, who leads the Omni product team, says Google has already watched users insert their likenesses into images with Nano Banana and expects the same with video.

That capability lands in a media environment already struggling to distinguish real footage from synthetic. Google ran safety reviews internally, brought in outside specialists, and conducted ethics assessments before releasing the standard playbook for any major model launch. Every piece of content created or edited with Omni Flash carries a SynthID invisible watermark, designed to help platforms and fact-checkers identify AI-generated video.

Whether SynthID is enough is a real question. Watermarks can be stripped, degraded, or re-encoded out of existence. Google’s safety claims do not eliminate misuse; they just create a paper trail.

READ: OpenAI's Daybreak Joins the AI Cybersecurity Fight

Where Gemini Omni Goes From Here

Google is positioning Omni as a platform, not a single model. The Omni name signals the company’s end goal: a model that can create anything from any input type. Omni Flash is the first model in that family. Others will follow.

OpenAI’s Sora and Meta’s video generation tools are both active, and the race for multimodal video generation is accelerating. Google’s advantage is distribution. Gemini, YouTube, and Google Flow give it more surfaces to deploy Omni than most competitors can match.

Whether it holds up outside demo conditions is what the next few weeks will settle.