Alibaba's Qwen 3.5 Claims to Beat GPT-5.2 and Claude Opus 4.5 - and It's Open Source
Alibaba releases Qwen 3.5, a 397-billion-parameter open-weight model that claims to outperform US frontier models at a fraction of the cost.

Alibaba dropped Qwen 3.5 on Sunday, the eve of Chinese New Year, and the numbers tell a story that should concern every closed-model lab in Silicon Valley. The 397-billion-parameter model claims to outperform both GPT-5.2 and Claude Opus 4.5 on the majority of benchmarks Alibaba tested - and the weights are free to download under an open license.
This is not philanthropy. It is market strategy, and the economics behind it deserve scrutiny.
What Qwen 3.5 Actually Is
Qwen 3.5 is a sparse mixture-of-experts (MoE) model. It packs 397 billion total parameters but activates only 17 billion per forward pass, which means it can run on hardware that would choke on a dense model of comparable scale. Alibaba says it delivers 60% lower costs and 8x higher throughput than its predecessor, Qwen 3.
The model is natively multimodal - text, images, audio, and video processed in a single system - and supports 201 languages and dialects, up from 82 in the previous generation. Its context window stretches to 262,144 tokens, expandable to roughly one million with customization.
Three architectural innovations stand out. A hybrid attention mechanism blends standard quadratic attention with linear attention heads to cut memory usage on long contexts. A gated delta network streamlines parameter updates during training. And selective expert activation routes each query through only 10 of the model's neural subnetworks, keeping compute requirements manageable.
Alibaba also released a closed-source variant, Qwen 3.5-Plus, with a one-million-token context window aimed at enterprise customers who want hosted inference.
The Benchmark Claims
Alibaba tested Qwen 3.5 against GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro across more than 30 benchmarks. The headline claim: Qwen 3.5 outperforms the US models on roughly 80% of evaluated categories.
Specific numbers tell a more nuanced story. On LiveCodeBench v6, Qwen 3.5 scores 83.6. On AIME26 (math reasoning), it hits 91.3. On GPQA Diamond, 88.4. In document recognition (OmniDocBench v1.5), it breaks 90% with a score of 90.8, beating GPT-5.2 (85.7), Claude Opus 4.5 (87.7), and Gemini 3 Pro (88.5).
But Qwen 3.5 is not a universal champion. It dominates in agentic tasks, instruction following, and multimodal document understanding while trailing in pure reasoning and competitive coding. On BrowseComp (agentic search), it scores 78.6 - strong, but still behind Claude Opus 4.6 at 84.0.
Independent testing is still thin. As we have noted in our guide to understanding AI benchmarks, company-published numbers warrant healthy skepticism until the community can reproduce them.
The Business Logic
Here is where it gets interesting. Alibaba Cloud holds roughly 4% of the global cloud market, a distant third to AWS and Azure. Open-sourcing a frontier-class model is not an act of generosity - it is a customer acquisition strategy.
The playbook is straightforward. Developers download Qwen 3.5 for free, build products on top of it, and when they need to scale, Alibaba Cloud is right there with hosted inference, fine-tuning infrastructure, and enterprise support through Qwen 3.5-Plus. The open weights are the top of the funnel.
This mirrors what Meta did with Llama, but Alibaba's execution has been more aggressive. According to download data from Hugging Face, Qwen models in December 2025 exceeded the download volumes of all other major open models combined. That is a staggering adoption figure, and it gives Alibaba enormous leverage in the developer ecosystem.
The timing of the release - landing amid the Qwen 3 generation that was already competitive - suggests Alibaba is prioritizing speed over polish. Move fast, capture developers, monetize later.
China's Open-Source AI Offensive
Qwen 3.5 did not arrive in isolation. The same week saw ByteDance release Doubao 2.0, and DeepSeek is expected to follow shortly with its own upgrade. Zhipu AI also pushed an updated model. China's major AI players are all converging on the same strategy: open-weight releases that undercut US pricing while matching or exceeding US performance on key tasks.
The contrast with Silicon Valley is stark. OpenAI, Anthropic, and Google DeepMind have kept their frontier models behind API paywalls. OpenAI's recent GPT-oss open-weight releases were a notable exception, but those models are smaller and less capable than the company's flagship systems.
Chinese open models surpassed their US counterparts in global adoption last year, a shift that has not gone unnoticed in Washington. The Trump administration has reportedly prioritized promoting US open models globally as a counterweight to Chinese influence in the developer ecosystem.
The Agentic Angle
Perhaps the most strategically significant aspect of Qwen 3.5 is its focus on agentic capabilities. The model can independently take actions across mobile and desktop applications - booking appointments, executing workflows, navigating spreadsheets - without constant human input.
This is the direction the entire industry is heading. But while US labs like Anthropic and OpenAI are building agentic features into their closed platforms, Alibaba is making agentic capabilities freely available for anyone to build on. That is a meaningful competitive difference for developers choosing their AI stack.
What This Means
The era when a handful of US companies could maintain comfortable leads in model capability is eroding. Qwen 3.5 may not be the best model at everything, but it is good enough at most things - and free is a powerful price point.
For enterprise buyers, the calculus is shifting. Why pay per-token API rates to a US lab when an open-weight alternative delivers comparable performance on your own infrastructure? The answer used to be clear: because the US models were meaningfully better. That gap is narrowing fast.
Alibaba's bet is that in a world where model capability is increasingly commoditized, the winner is whoever controls the infrastructure layer. Open-source the model, own the cloud. It is a strategy that has worked before in software. Whether it works in AI depends on whether the benchmark claims hold up under independent scrutiny - and whether 700 million Hugging Face downloads translate into paying cloud customers.
The next few months of community testing will tell us a lot. The open-source LLM leaderboard should provide independent validation soon. Until then, the numbers are impressive, but they are Alibaba's numbers. And in this market, trust but verify is the only sensible policy.
Sources:
- Alibaba unveils Qwen3.5 as China's chatbot race shifts to AI agents - CNBC
- Alibaba releases multimodal Qwen3.5 mixture of experts model - SiliconANGLE
- Alibaba unveils Qwen-3.5, sharpening global race to spread AI models - South China Morning Post
- Alibaba's Qwen 3.5: Riding the Open-Source AI S-Curve - AInvest
- Qwen 3.5 vs Claude Opus 4.5 vs Gemini 3 Pro: Benchmarks Compared - Geeky Gadgets
- Qwen 3.5: 397B MoE Benchmarks, Pricing & Complete Guide - DigitalApplied