
Gemini Omni Flash
Google DeepMind's multimodal video generation model that creates 10-second clips with native audio from text, images, or video inputs - and lets you refine results through conversation.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Google DeepMind's multimodal video generation model that creates 10-second clips with native audio from text, images, or video inputs - and lets you refine results through conversation.

H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

Google DeepMind's upcoming flagship model with a 2M-token context window and Deep Think reasoning, announced at Google I/O 2026 and expected in July.

Grok 4.3 slashes prices by up to 83%, adds native video input and voice cloning, and carves out a credible position as the most cost-efficient frontier model - with real caveats on coding and latency.

Kuaishou's Kling 3.0 is the first commercially available AI video model to ship native 4K at 60fps, with multilingual audio, multi-shot storyboarding, and a $0.075/s API.

xAI's Grok Imagine Video 1.5 is the #1-ranked image-to-video model on Artificial Analysis, generating 720p clips with native audio at $0.14/s - 86% cheaper than Sora 2 Pro.

ByteDance's top-ranked AI video generation model with native joint audio-video synthesis, multi-shot support, and multimodal reference inputs across up to 12 files per generation.

Alibaba's open-source video generation model with MoE architecture, native audio, first-and-last-frame control, and 1080p output up to 15 seconds.

HappyHorse-1.0 is Alibaba's 15-billion-parameter video generation model that ranked #1 on Artificial Analysis, producing 720p-1080p clips with joint audio-video synthesis in a single forward pass.

SkyReels V4 is Skywork AI's unified multi-modal video model that jointly generates 1080p/32FPS video and synchronized audio from a single dual-stream diffusion transformer.

OpenAI's Sora 2 generates physics-accurate video with synchronized audio from text or images, available API-only until its September 24, 2026 sunset.

Runway's Gen-4.5 is a video generation model built on an Autoregressive-to-Diffusion architecture that held the top Artificial Analysis Elo position at launch with 1,247 points before Seedance 2.0 and Kling 3.0 surpassed it in early 2026.