Articles Tagged "Multimodal"

Gemini 3.5 Flash Review: When Flash Surpasses Pro

Gemini 3.5 Flash leads on agentic benchmarks, runs 4x faster than Claude and GPT-5.5, and undercuts both on price - but a hidden long-context weakness and a 3x price hike over its predecessor deserve scrutiny.

Gemini 3.5 Flash

Google DeepMind's fastest frontier model, hitting 76.2% on Terminal-Bench 2.1 and 289 tok/s, now powering AI Mode in Search for over 1 billion monthly users.

Google Search Gains AI Agents, Hits 1 Billion Users

Google's AI Mode reached 1 billion monthly users at I/O 2026, as the company announced information agents, agentic booking, and generative UI set to transform Search this summer.

Android XR Glasses Land with Samsung and Warby Parker

Google unveiled three Android XR smart glasses form factors at I/O 2026, backed by Samsung, XREAL, Warby Parker, and Gentle Monster as launch partners.

Gemini Omni Leaks Before I/O - Inside Google's Video Plans

Google's next video model surfaced in the Gemini UI a week before I/O 2026, showing editing-first features including in-chat watermark removal and object swapping.

Thinking Machines Builds AI That Listens While Talking

Mira Murati's startup unveils TML-Interaction-Small, a 276B MoE model that hits 0.40-second response latency by listening and generating speech at the same time.

Agent Overload, Blind Attention, Unsafe Traces

Three new papers show that more agent components backfire, reasoning models hide unsafe thinking, and vision-language models waste most of their attention.

GPT-Realtime-2

OpenAI's second-generation real-time audio model with GPT-5-class reasoning, 128K context, five reasoning levels, and parallel tool calling - now generally available in the Realtime API.

xAI Opens Grok 4.3 API: 83% Price Cut, Video Input

xAI opened Grok 4.3 to all API developers on May 6 with an 83% output price cut, 1M-token context, native video input, and document generation - plus five legacy models retiring May 15.

How to Use AI for Photo Editing - A Beginner's Guide

A practical beginner's guide to AI photo editing - background removal, object erasure, and generative fill - using free tools anyone can start with today.

Nemotron 3 Nano Omni Unifies Vision, Audio, Language

NVIDIA's new open omni model activates 3B of 30B parameters, processes video, audio, and documents in one pass, and delivers up to 9.2x higher throughput than other open omni models.

Claude Enters the Creative Studio - 9 MCP Connectors

Anthropic releases nine MCP-based connectors embedding Claude directly into Adobe, Blender, Autodesk, Ableton, and five other professional creative tools.

← Previous