
Google Launches Gemini Embedding 2 for Multimodal AI
Google's first natively multimodal embedding model maps text, images, video, audio, and PDFs into a single vector space - now in public preview via Gemini API and Vertex AI.

Google's first natively multimodal embedding model maps text, images, video, audio, and PDFs into a single vector space - now in public preview via Gemini API and Vertex AI.

Grok 4 is xAI's frontier reasoning model, the first to break 50% on Humanity's Last Exam, with a 256K context window, $3/M input pricing, and a Heavy multi-agent variant built on 200,000 GPUs.

Microsoft releases Phi-4-reasoning-vision-15B - a 15B open-weight multimodal model trained on 240 GPUs in 4 days that competes with 100B+ parameter models on math, science, and GUI understanding.

New research reveals models can fake poor performance under adversarial prompts, a smarter critic improves SWE-bench by 15 points, and Microsoft shows compact vision models can punch above their weight.

Alibaba completes the Qwen 3.5 lineup with four small models - 0.8B, 2B, 4B, and 9B - all natively multimodal, 262K context, Apache 2.0. The 9B outperforms last-gen Qwen3-30B and beats GPT-5-Nano on vision benchmarks.

Qwen3.5-0.8B is the smallest natively multimodal model in the Qwen 3.5 family - 0.8B parameters handling text, images, and video with 262K context. MathVista 62.2, OCRBench 74.5. Apache 2.0.