Articles Tagged "MoE"

LongCat-2.0 Review: China's Stealth Coder

Meituan's 1.6T open-source coding model secretly topped OpenRouter for two months before revealing itself - and the price-to-performance math is hard to argue with.

LongCat-2.0

Meituan's 1.6T-parameter open-source MoE coding model, trained end-to-end on 50,000 domestic Chinese ASICs, with native 1M token context and a 59.5 SWE-bench Pro score.

Meituan's LongCat-2.0 Was Topping OpenRouter in Disguise

Meituan open-sources LongCat-2.0, a 1.6T MoE model trained on 50,000 Chinese ASICs that secretly topped OpenRouter under the alias Owl Alpha.

Holo3-35B-A3B

H Company's open-weight sparse MoE vision-language model purpose-built for desktop computer use, scoring 82.6% on OSWorld-Verified with only 3B active parameters.

North Mini Code

Cohere's first developer-focused model - 30B sparse MoE with 3B active parameters, free Apache 2.0 license, 256K context window, and 33.4 on the AA Coding Index.

Wan 2.7

Alibaba's open-source video generation model with MoE architecture, native audio, first-and-last-frame control, and 1080p output up to 15 seconds.

ERNIE 5.1

Baidu's ERNIE 5.1 is a text-focused MoE model that claims the top Chinese model slot on LMArena with 800B parameters built at 6% of comparable training costs.

GLM-5.2 Ships MIT-Licensed, 1M Context, Zero Benchmarks

Zhipu AI's GLM-5.2 ships with 1M token context, 744B MoE parameters, and MIT license the day after Fable 5 goes offline - but no benchmark numbers at launch.

GLM-5.2

Z.ai's GLM-5.2 is a 744B open-weight MoE model with a 1M token context window, MIT license, and first-day support for eight coding agents at roughly 1/10th the cost of US frontier models.

Kimi K2.7-Code

Moonshot AI's Kimi K2.7-Code is a 1T-parameter open-weight MoE coding model with mandatory thinking mode, 256K context, and 30% fewer reasoning tokens than K2.6.

Kimi K2.7-Code - Moonshot's Open-Weight Coding Leap

Moonshot AI ships Kimi K2.7-Code with 30% fewer reasoning tokens and a 21.8% gain on its own coding benchmarks, but the model still trails Claude Opus 4.8 on most tests in the same table.

MAI-Thinking-1

Microsoft's first in-house reasoning model, a 35B-active sparse MoE with 256K context, 97% on AIME 2025, and no distillation from third-party labs.