Articles Tagged "Multimodal"

Claude Opus 4.7 Review: Coding Giant, Mixed Signals

Claude Opus 4.7 leads SWE-bench and agent benchmarks but regresses on web research, inflates token costs by up to 35%, and trades prose quality for literal instruction-following.

Machine Translation Benchmarks Leaderboard 2026

Rankings of LLMs and dedicated MT systems across FLORES-200, WMT24/25, TICO-19, and MT-GenEval benchmarks with BLEU, COMET, and human evaluation scores.

Audio Understanding Benchmarks Leaderboard 2026

Rankings of the best audio language models on MMAU, MMAU-Pro, and other benchmarks covering speech reasoning, music understanding, and environmental sound identification.

Anthropic Launches Claude Design, Knocks Figma 7%

Anthropic's new Claude Design tool turns text prompts into prototypes and slide decks - and wiped 7% off Figma's stock price the moment it launched.

Physical Intelligence Launches π0.7 for Untrained Tasks

Physical Intelligence's π0.7 robot model can generalize to tasks it was never explicitly trained on, matching fine-tuned specialist models through compositional skill recombination.

Video Generation Benchmarks Leaderboard 2026

Rankings of AI video generation models across VBench, VBench-2.0, and the Artificial Analysis Video Arena Elo system, covering text-to-video and image-to-video performance.

Vision-Language Benchmarks: Image Reasoning Ranked

Rankings of AI models on the key visual reasoning benchmarks - MMMU, MathVista, ChartQA, DocVQA, OCRBench, AI2D, CharXiv, and more - focused on image and document understanding.

Qwen 3.6-35B-A3B

Alibaba's 35B sparse MoE with 3B active parameters delivers 73.4% SWE-bench Verified, multimodal vision and video, 256K context, and DeltaNet hybrid architecture under Apache 2.0.

Qwen 3.6 Ships a 35B MoE That Codes Like Models 10x Its Size

Alibaba's Qwen 3.6-35B-A3B activates only 3B of its 35B parameters per token, scores 73.4% on SWE-bench Verified, handles video and images, and ships under Apache 2.0.

Anthropic's latest flagship model ships with 3x higher resolution vision, a new xhigh effort level, task budgets for cost control, cyber safeguards, and 13% better coding performance at the same $5/$25 pricing.

Claude Opus 4.7 Is Here - Less Supervision, Better Vision

Anthropic releases Claude Opus 4.7 with 3x higher resolution vision, a new xhigh effort level, task budgets for cost control, /ultrareview in Claude Code, and cyber safeguards that automatically block high-risk requests.

Google Ships a Native Gemini App for Mac Built in 100 Days

Google launched a free native Gemini app for Mac with screen sharing, window context, image and video generation, and a global Option+Space shortcut - built in pure Swift with 100+ features in under 100 days.

← Previous