Articles Tagged "Multimodal"

Gemini Omni Flash

Google DeepMind's multimodal video generation model that creates 10-second clips with native audio from text, images, or video inputs - and lets you refine results through conversation.

Holo3-35B-A3B

H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

Gemini 3.5 Pro

Google DeepMind's upcoming flagship model with a 2M-token context window and Deep Think reasoning, announced at Google I/O 2026 and expected in July.

Grok 4.3 Review: xAI Bets on Price Over Prestige

Grok 4.3 slashes prices by up to 83%, adds native video input and voice cloning, and carves out a credible position as the most cost-efficient frontier model - with real caveats on coding and latency.

Kling 3.0

Kuaishou's Kling 3.0 is the first commercially available AI video model to ship native 4K at 60fps, with multilingual audio, multi-shot storyboarding, and a $0.075/s API.

Grok Imagine Video 1.5

xAI's Grok Imagine Video 1.5 is the #1-ranked image-to-video model on Artificial Analysis, generating 720p clips with native audio at $0.14/s - 86% cheaper than Sora 2 Pro.

Dreamina Seedance 2.0

ByteDance's top-ranked AI video generation model with native joint audio-video synthesis, multi-shot support, and multimodal reference inputs across up to 12 files per generation.

Wan 2.7

Alibaba's open-source video generation model with MoE architecture, native audio, first-and-last-frame control, and 1080p output up to 15 seconds.

HappyHorse-1.0

HappyHorse-1.0 is Alibaba's 15-billion-parameter video generation model that ranked #1 on Artificial Analysis, producing 720p-1080p clips with joint audio-video synthesis in a single forward pass.

SkyReels V4

SkyReels V4 is Skywork AI's unified multi-modal video model that jointly generates 1080p/32FPS video and synchronized audio from a single dual-stream diffusion transformer.

Sora 2

OpenAI's Sora 2 generates physics-accurate video with synchronized audio from text or images, available API-only until its September 24, 2026 sunset.

Runway Gen-4.5

Runway's Gen-4.5 is a video generation model built on an Autoregressive-to-Diffusion architecture that held the top Artificial Analysis Elo position at launch with 1,247 points before Seedance 2.0 and Kling 3.0 surpassed it in early 2026.