Articles Tagged "Multimodal"

Gemini Omni Flash

Gemini Omni Flash

Google DeepMind's multimodal video generation model that creates 10-second clips with native audio from text, images, or video inputs - and lets you refine results through conversation.

Holo3-35B-A3B

Holo3-35B-A3B

H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

Gemini 3.5 Pro

Gemini 3.5 Pro

Google DeepMind's upcoming flagship model with a 2M-token context window and Deep Think reasoning, announced at Google I/O 2026 and expected in July.

Grok 4.3 Review: xAI Bets on Price Over Prestige

Grok 4.3 Review: xAI Bets on Price Over Prestige

Grok 4.3 slashes prices by up to 83%, adds native video input and voice cloning, and carves out a credible position as the most cost-efficient frontier model - with real caveats on coding and latency.

Kling 3.0

Kling 3.0

Kuaishou's Kling 3.0 is the first commercially available AI video model to ship native 4K at 60fps, with multilingual audio, multi-shot storyboarding, and a $0.075/s API.

Grok Imagine Video 1.5

Grok Imagine Video 1.5

xAI's Grok Imagine Video 1.5 is the #1-ranked image-to-video model on Artificial Analysis, generating 720p clips with native audio at $0.14/s - 86% cheaper than Sora 2 Pro.

Dreamina Seedance 2.0

Dreamina Seedance 2.0

ByteDance's top-ranked AI video generation model with native joint audio-video synthesis, multi-shot support, and multimodal reference inputs across up to 12 files per generation.

Wan 2.7

Wan 2.7

Alibaba's open-source video generation model with MoE architecture, native audio, first-and-last-frame control, and 1080p output up to 15 seconds.

HappyHorse-1.0

HappyHorse-1.0

HappyHorse-1.0 is Alibaba's 15-billion-parameter video generation model that ranked #1 on Artificial Analysis, producing 720p-1080p clips with joint audio-video synthesis in a single forward pass.

SkyReels V4

SkyReels V4

SkyReels V4 is Skywork AI's unified multi-modal video model that jointly generates 1080p/32FPS video and synchronized audio from a single dual-stream diffusion transformer.

Sora 2

Sora 2

OpenAI's Sora 2 generates physics-accurate video with synchronized audio from text or images, available API-only until its September 24, 2026 sunset.

Runway Gen-4.5

Runway Gen-4.5

Runway's Gen-4.5 is a video generation model built on an Autoregressive-to-Diffusion architecture that held the top Artificial Analysis Elo position at launch with 1,247 points before Seedance 2.0 and Kling 3.0 surpassed it in early 2026.