Speech recognition

Qwen3.5-Omni Does 10-Hour Audio and 4M Video Frames

Qwen3.5-Omni Does 10-Hour Audio and 4M Video Frames

Alibaba's Qwen3.5-Omni handles audio, video, images, and text in a single model pass - and generates speech in real time. The Plus variant hits SOTA on 215 benchmarks and edges out Gemini 3.1 Pro on audio tasks.