
NVIDIA SANA-WM
NVIDIA's SANA-WM is a 2.6B-parameter hybrid linear diffusion transformer that generates 60-second 720p video with 6-DoF camera control on a single H100, built for embodied AI and robotics simulation.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

NVIDIA's SANA-WM is a 2.6B-parameter hybrid linear diffusion transformer that generates 60-second 720p video with 6-DoF camera control on a single H100, built for embodied AI and robotics simulation.

NVIDIA NVLabs open-sourced SANA-WM, a 2.6B-parameter world model that generates 60-second 720p camera-controlled video on a single GPU, outperforming 14B+ competitors that need 8 GPUs.

A direct comparison of Midjourney V8.1 and FLUX.2 across image quality, pricing, API access, and licensing - with real benchmark numbers.

Stability AI releases Stable Audio 3.0 as a four-model family with a new SAME autoencoder, open weights for three of four variants, and tracks up to 6 minutes 20 seconds - while Suno and Udio face ongoing copyright lawsuits over their training data.

HiDream-O1-Image is an 8B open-source text-to-image model with a pixel-space diffusion architecture that outperforms 32B FLUX.2 [dev] across five major benchmarks.

Ideogram 3.0 is Ideogram AI's most capable text-to-image model, leading the field in typography accuracy at ~90-95% and offering production-ready API access at $0.03-$0.09 per image.

Google DeepMind's Veo 3.1 generates 4K video with native audio and is now free for every Google account at 10 clips per month via Google Vids.

NVIDIA's Spatial Intelligence Lab released Lyra 2.0, a 14B model that turns a single photograph into a navigable 3D environment - but the weights carry a research-only license.

GPT Image 1.5 leads Artificial Analysis at 1278 Elo while Nano Banana 2 tops Arena.ai - two leaderboards, two answers, and five new models that reshaped the rankings since March.

Microsoft's production-focused image generation model - 41% cheaper and 22% faster than MAI-Image-2, optimized for high-volume enterprise workflows.

LTX-2.3 is a 22-billion-parameter open-source video generation model from Lightricks that produces native 4K video with synchronized audio in a single diffusion pass.

Helios is a 14B open-source autoregressive diffusion model that generates minute-long videos at 19.5 FPS on a single H100, matching 1.3B distilled model speeds at full 14B quality.