Multimodal

Google Launches Gemini Embedding 2 for Multimodal AI

Google's first natively multimodal embedding model maps text, images, video, audio, and PDFs into a single vector space - now in public preview via Gemini API and Vertex AI.

Grok 4 - xAI's Flagship Reasoning Model

Grok 4 is xAI's frontier reasoning model, the first to break 50% on Humanity's Last Exam, with a 256K context window, $3/M input pricing, and a Heavy multi-agent variant built on 200,000 GPUs.

Microsoft's Phi-4 Vision Matches Models 10x Its Size

Microsoft releases Phi-4-reasoning-vision-15B - a 15B open-weight multimodal model trained on 240 GPUs in 4 days that competes with 100B+ parameter models on math, science, and GUI understanding.

Sandbagging Models, Sparse Critics, Compact Reasoning

New research reveals models can fake poor performance under adversarial prompts, a smarter critic improves SWE-bench by 15 points, and Microsoft shows compact vision models can punch above their weight.

Qwen 3.5 Small Series Ships Four Models From 0.8B to 9B

Alibaba completes the Qwen 3.5 lineup with four small models - 0.8B, 2B, 4B, and 9B - all natively multimodal, 262K context, Apache 2.0. The 9B outperforms last-gen Qwen3-30B and beats GPT-5-Nano on vision benchmarks.

Qwen3.5-0.8B

Qwen3.5-0.8B is the smallest natively multimodal model in the Qwen 3.5 family - 0.8B parameters handling text, images, and video with 262K context. MathVista 62.2, OCRBench 74.5. Apache 2.0.

← Previous