Articles Tagged "Diffusion Models"

SkyReels V4

SkyReels V4 is Skywork AI's unified multi-modal video model that jointly generates 1080p/32FPS video and synchronized audio from a single dual-stream diffusion transformer.

Runway's Gen-4.5 is a video generation model built on an Autoregressive-to-Diffusion architecture that held the top Artificial Analysis Elo position at launch with 1,247 points before Seedance 2.0 and Kling 3.0 surpassed it in early 2026.

DiffusionGemma 26B Review: 4x Faster, Real Tradeoffs

Google DeepMind's DiffusionGemma generates 1,000+ tokens per second through parallel diffusion, trading 5-19 benchmark points against Gemma 4 for speed and unique bidirectional generation capabilities.

DiffusionGemma 26B

DiffusionGemma 26B is Google DeepMind's open-weight discrete diffusion language model that generates 256 tokens in parallel, reaching 1,100+ tokens/sec on H100 - roughly 4x faster than autoregressive models of the same size.

NVIDIA SANA-WM

NVIDIA's SANA-WM is a 2.6B-parameter hybrid linear diffusion transformer that generates 60-second 720p video with 6-DoF camera control on a single H100, built for embodied AI and robotics simulation.

NVIDIA SANA-WM - Minute-Scale Video on One GPU

NVIDIA NVLabs open-sourced SANA-WM, a 2.6B-parameter world model that generates 60-second 720p camera-controlled video on a single GPU, outperforming 14B+ competitors that need 8 GPUs.

Midjourney vs FLUX 2026: Which AI Image Generator Wins

A direct comparison of Midjourney V8.1 and FLUX.2 across image quality, pricing, API access, and licensing - with real benchmark numbers.

Stable Audio 3.0 Ships Open Weights, 6-Min Songs

Stability AI releases Stable Audio 3.0 as a four-model family with a new SAME autoencoder, open weights for three of four variants, and tracks up to 6 minutes 20 seconds - while Suno and Udio face ongoing copyright lawsuits over their training data.

HiDream-O1-Image

HiDream-O1-Image is an 8B open-source text-to-image model with a pixel-space diffusion architecture that outperforms 32B FLUX.2 [dev] across five major benchmarks.

Ideogram 3.0

Ideogram 3.0 is Ideogram AI's most capable text-to-image model, leading the field in typography accuracy at ~90-95% and offering production-ready API access at $0.03-$0.09 per image.

Veo 3.1

Google DeepMind's Veo 3.1 generates 4K video with native audio and is now free for every Google account at 10 clips per month via Google Vids.

NVIDIA Lyra 2.0 - Explorable 3D Worlds from One Photo

NVIDIA's Spatial Intelligence Lab released Lyra 2.0, a 14B model that turns a single photograph into a navigable 3D environment - but the weights carry a research-only license.