Articles Tagged "Multimodal"

Qwen3.6-27B

Qwen3.6-27B

Qwen3.6-27B is a 27B dense open-weight multimodal model from Alibaba that scores 77.2% on SWE-bench Verified - beating Alibaba's own 397B MoE - under Apache 2.0.

GPT Image 2

GPT Image 2

GPT Image 2 (ChatGPT Images 2.0) brings 99%+ text accuracy, 2K resolution, web-search grounding, and a Thinking mode for character-consistent storyboards.

ERNIE 5.0

ERNIE 5.0

Baidu's ERNIE 5.0 combines 2.4 trillion parameters with native omni-modal design, landing at LMArena's top-10 globally and outpacing GPT-5 High on chart and document benchmarks.

EXAONE 4.5

EXAONE 4.5

LG AI Research's first open-weight vision-language model packs 33B parameters, 262K context, and STEM scores above GPT-5-mini - but ships under a non-commercial license.

Qwen3.5-Omni

Qwen3.5-Omni

Alibaba's Qwen3.5-Omni takes text, images, audio, and video as input and streams both text and speech output in a single end-to-end model with a 256K context window.

Gemini 3.1 Flash TTS

Gemini 3.1 Flash TTS

Google's Gemini 3.1 Flash TTS ships in preview with 30 voices, 70-plus languages, 200-plus inline audio tags, and Elo 1,211 on the Artificial Analysis TTS Arena.

Veo 3.1

Veo 3.1

Google DeepMind's Veo 3.1 generates 4K video with native audio and is now free for every Google account at 10 clips per month via Google Vids.

Kimi K2.6

Kimi K2.6

Moonshot AI's Kimi K2.6 is a 1T-parameter MoE with 32B active per token, 256K context, a 300-agent swarm running 4,000 coordinated steps, and the top SWE-Bench Pro score among open-weight models at 58.6%.