Gemini Omni Leaks Before I/O - Inside Google's Video Plans

Google's next video model has been sitting inside the Gemini interface for over a week. The announcement comes tomorrow.

On May 11, Reddit users spotted a new model card inside the Gemini video generation tab. The text read: "Create with Gemini Omni: meet our new video model, remix your videos, edit directly in chat, try templates, and more." The card appeared directly with "Toucan" - the internal codename for Google's current Veo 3.1-powered video tool. That placement matters: two models in the same tab means a product coexistence strategy, not a straight replacement.

Google I/O 2026 starts May 19 at Shoreline Amphitheatre. What the leak shows is enough to sketch out Google's video strategy before the keynote makes it official - and the bet Google is making is more specific than "make a better video model."

TL;DR

Gemini Omni surfaced in the Gemini UI on May 11; Google I/O keynote is May 19
Built around in-chat editing: watermark removal, object swap, scene rewrites by text
Expected in Flash and Pro tiers; credit-based metering with steep consumption
Raw generation quality lags ByteDance's Seedance 2, but editing capability is competitive
OpenAI dropped Sora in April 2026, leaving a consumer video editing gap that Omni targets

What the Leak Shows vs Competitors

The gap between Omni and the current competition is sharper on editing than on generation quality. Omni's closest functional competitor for in-chat editing is nothing that currently exists at scale - that's the actual opportunity.

Capability	Gemini Omni	Veo 3.1	Seedance 2	Runway Gen-4
In-chat video editing	Yes	No	No	No
Watermark removal	Yes	No	No	Limited
Object replacement	Yes	No	No	Limited
Scene rewrite via text	Yes	No	No	No
Raw generation quality	Moderate	High	Highest	High
Available via API	Yes	Yes	Yes	Yes
Max clip length (early access)	~10 sec	8 sec	60+ sec	10 sec

How Omni Got Into the UI Early

The Model Card Text

Model cards in Google's Gemini interface aren't placeholders. They're product descriptions that go through localization and user-research review cycles. Finding one visible in a production tab - not buried in a config file - means the release was already staged when the leak surfaced.

The specific language is worth reading closely. "Remix your videos, edit directly in chat, try templates" describes three distinct product interactions. Remixing implies transformation of existing clips - not generation from scratch. Templates suggest a structured content-creation workflow. "Edit directly in chat" is the most significant: it means the editing command and the output stay in the same Gemini conversation window, rather than requiring an export to a standalone editor.

Where Toucan Fits

Toucan isn't going away. Both names appear in the current Gemini video tab, which suggests Google is treating them as parallel products: Veo 3.1 for high-quality generation where a user wants a clean output from a prompt, Omni for iterative editing workflows where the user is working from existing footage.

That split mirrors how professional video tools work. Premiere Pro and After Effects coexist for the same reason: generation and editing have different optimal workflows, and forcing them into a single model degrades both.

Screenshot of the Gemini Omni model card as spotted by Reddit users on May 11 The Gemini Omni model card that surfaced in the Gemini video generation interface on May 11, 2026. Source: testingcatalog.com

Early Test Results

What Worked

Reviewers who accessed early builds noted that a math-focused educational video rendered accurately, with reliable formula display and natural-looking animation. Template-based generation - building from one of Omni's preset structures - produced consistent results across multiple attempts.

The editing functions were the stronger showing. Watermark removal worked across several test cases. Object swaps within existing clips - replacing one item with another by text instruction - handled different lighting conditions better than reviewers expected for a first-generation capability. Scene rewrites via text, where a user describes a change and the model changes the existing clip, worked well on shorter sequences.

Two video generations consumed most of a daily AI Pro quota, which puts Omni's compute cost in the same range as current Imagen 3 Pro image generation - significant but not unusual for a video model.

What Didn't

Raw generation quality is where Omni falls short of the current best. A dinner scene in early tests had a clear artifact problem: objects appeared mid-clip without warning, a failure mode common in video diffusion models where the temporal consistency mechanism loses track of scene elements across frames.

ByteDance's Seedance 2 still leads on generation quality benchmarks. The 10-second clip limit in early access also puts Omni below Seedance 2 for users who need longer sequences.

Film production clapperboard representing video model generation workflow AI video models are being compared on editing capabilities just as much as generation quality in 2026. Source: unsplash.com

The Competitive Context

Sora Is Gone

OpenAI discontinued Sora in April 2026. That removed the most recognizable consumer video AI brand from the market and left a real gap specifically in the editing-capable segment - Sora had offered some transformation and variation features that competitors haven't fully matched.

Omni's in-chat editing design targets exactly that gap. If it works as the model card suggests, it's not primarily competing with Seedance 2 on generation benchmarks. It's competing on workflow: how fast can a creator go from "I have a clip" to "I have the clip I want."

The Modality Unification Angle

Google runs its image model, Nano Banana 2, separately from Veo 3.1. Both are accessed through different tabs inside Gemini. Omni's positioning as a native Gemini video model - described as being built into the chat interface rather than accessed as a separate tool - suggests Google is working toward a single conversation that handles text, image, and video in sequence, without the user switching context.

If Omni expands to handle still images and text in the same session, it closes a genuine gap: no current top-tier provider offers generation across all three modalities from a single conversational window.

Google's bet isn't to out-generate Seedance 2. It's to make editing fast enough that generation quality becomes a secondary concern for most creators.

What It Does Not Tell You

The leak provides a product description, not a product specification. Several things remain unknown:

Pricing per generation. The credit consumption rate observed in early access is steep, but the credit-to-dollar conversion for Omni hasn't been published. Gemini Pro pricing exists for text and image use, but video generation costs are a separate line item that Google hasn't announced.

API rate limits. The model card mentions API availability. Developers integrating video generation into production pipelines need quota and pricing information to plan architecture. None has surfaced.

Training data. No information has leaked about Omni's training dataset. Given ongoing copyright disputes involving AI video training - including the Seedance 2 suspension over Hollywood footage claims - this is a question Google will likely need to address directly at I/O.

Long-term product roadmap. Veo and Omni coexisting in the same tab is a launch-phase posture. Whether Veo 3.1 gets merged into Omni, deprecated, or continues as a separate product line isn't answerable from a model card.

Broader Gemini updates. Reports from May 17 indicate that whatever Gemini update Google announces at I/O lands roughly level with OpenAI's GPT-5.5 and below Anthropic's Claude Mythos on benchmark coverage. Omni's generation quality fits that expectation: competitive but not state-of-the-art.

The leak tells you what Google is building, but not how well the finished product executes. If the in-chat editing works as described, Omni fills a gap that no competitor has addressed cleanly. If the generation quality artifact problem visible in early tests persists in the release build, the editing-first framing becomes a way to sidestep a benchmark comparison Google would lose. The keynote starts in roughly 32 hours.

Sources: