China's open-source AI ecosystem produced two models that share an MIT-family license and a commitment to accessibility but approach the problem from opposite ends of the scale spectrum. Kimi K2.5 is Moonshot AI's trillion-parameter flagship - 384 experts, 32 billion active parameters per token, 256K context, and benchmark scores that rival the best proprietary models. GLM-4.7-Flash is Zhipu AI's efficiency play - 30 billion total parameters with roughly 3 billion active, a free API, and SWE-bench results that have no business being as strong as they are for a model this small.

The 33x parameter gap between them tells an obvious story on raw benchmarks. K2.5 wins everywhere by large margins. But GLM-4.7-Flash's SWE-bench Verified score of 59.2% and tau2-Bench of 79.5% reveal a model that was engineered specifically for agentic coding tasks, and at its price point - which is free - it is doing something remarkable. This is a comparison between China's best open model and China's best free model, and the distance between them is smaller than you might expect on the tasks that matter most for practical AI development.

TL;DR

Choose Kimi K2.5 if you need top-tier performance on math, science, vision, coding, and complex multi-agent workflows. You are willing to pay for API access or invest in cluster infrastructure for the best open-weight model available.
Choose GLM-4.7-Flash if you want a free, fast, MIT-licensed model that delivers surprisingly strong coding and agentic performance on consumer hardware. Budget-constrained teams and individual developers get real value here.

Quick Comparison

Feature	Kimi K2.5	GLM-4.7-Flash
Developer	Moonshot AI	Z.AI (Zhipu AI)
Architecture	MoE (384 experts, 8 active)	MoE (~30B/3B)
Total Parameters	1T	~30B
Active Parameters	32B	~3B
License	Modified MIT	MIT
Context Window	256K	128K
API Pricing (Input)	$0.60/1M tokens	Free (Z.AI)
API Pricing (Output)	$3.00/1M tokens	Free (Z.AI)
SWE-bench Verified	76.8%	59.2%
GPQA Diamond	87.6	Not published
MMLU-Pro	87.1	Not published
tau2-Bench	Not published	79.5%
Self-host Feasibility	Very Low	Very High (single RTX 4090)

Kimi K2.5: The Open-Weight Ceiling

Kimi K2.5 sets the ceiling for what open-weight models can achieve in early 2026. The architecture is built for coverage - 384 specialized experts across 61 layers, with 8 experts activating per token, trained using PARL (a reinforcement learning method for learning expert routing policies). This is not just scale for the sake of scale. The expert diversity means K2.5 has dedicated parameter subsets for mathematical reasoning, code generation, scientific analysis, vision processing, and agentic planning.

The math and science benchmarks confirm this breadth. AIME 2025 at 96.1 is near-perfect on competition mathematics. HMMT at 95.4 shows similar strength on harder olympiad-style problems. GPQA Diamond at 87.6 tests graduate-level science across physics, chemistry, and biology. These scores are competitive with GPT-5 and Claude Opus 4, which cost significantly more per token. For context on how K2.5 ranks against these proprietary models, see our math olympiad leaderboard.

On coding, K2.5 scores 76.8% on SWE-bench Verified - resolving more than three-quarters of real GitHub issues from actual open-source repositories. LiveCodeBench v6 at 85.0 confirms strong competitive programming performance. Terminal Bench 2.0 at 50.8 shows it can operate in terminal environments for system administration and DevOps tasks. These are not isolated benchmark wins - they represent a consistent pattern of capability across the full software development lifecycle.

The Agent Swarm feature adds another dimension. K2.5 can orchestrate up to 100 sub-agents simultaneously, each specializing in different aspects of a complex task. BrowseComp at 78.4% in swarm mode versus 60.6% in single-model mode shows how much this multi-agent architecture contributes to real-world capability. See our guide to building AI agents for more on how these orchestration patterns work.

GLM-4.7-Flash: Free and Surprisingly Capable

GLM-4.7-Flash is Zhipu AI's statement that useful AI does not need to be expensive. The free API through Z.AI removes the primary barrier to AI adoption for individual developers, students, researchers, and startups in markets where even $0.60 per million tokens represents a meaningful cost. The MIT license means you can also download the weights and self-host on a single RTX 4090 - roughly 3 billion active parameters at FP16 fit comfortably in 24 GB of VRAM.

The standout metric is SWE-bench Verified at 59.2%. To put this in context, that score means GLM-4.7-Flash resolves nearly 60% of real-world GitHub issues - real bugs, real feature requests, real code in production repositories. This is achieved with approximately 3 billion active parameters. K2.5 with 32 billion active parameters scores 76.8%. The delta is 17.6 points, but GLM is accomplishing this with roughly 10x fewer active parameters per token. Per-parameter efficiency on coding tasks is genuinely impressive.

The tau2-Bench score of 79.5% adds another data point. This benchmark measures agentic task completion - the ability to follow multi-step instructions, use tools, and complete complex workflows. A 79.5% score from a 3B-active model suggests that Zhipu AI specifically optimized for agentic capability during training, likely through targeted instruction tuning and tool-use data.

The 128K context window is half of K2.5's 256K but sufficient for most practical applications. You can process long codebases, multi-file pull requests, and extended conversations without hitting limits. For most coding-focused workflows, 128K is more than enough. Our how to choose an LLM guide provides more guidance on matching context requirements to models.

Benchmark Comparison

Benchmark	Kimi K2.5	GLM-4.7-Flash	Delta
SWE-bench Verified	76.8%	59.2%	K2.5 +17.6
GPQA Diamond	87.6	Not published	K2.5 by default
MMLU-Pro	87.1	Not published	K2.5 by default
AIME 2025	96.1	Not published	K2.5 by default
LiveCodeBench v6	85.0	Not published	K2.5 by default
tau2-Bench	Not published	79.5%	GLM by default
BrowseComp	78.4% (swarm)	Not published	K2.5 by default
Terminal Bench 2.0	50.8	Not published	K2.5 by default
Context Window	256K	128K	K2.5 (2x longer)
Active Params	32B	~3B	K2.5 (10.7x more)
Total Params	1T	~30B	K2.5 (33x more)

The benchmark table has a lot of "not published" entries for GLM-4.7-Flash, which itself tells a story. Zhipu AI focused their evaluation on the specific capabilities they optimized for - coding and agentic tasks - rather than publishing across the full benchmark suite. This is common with smaller, specialized models and reflects an honest assessment of where the model excels versus where it would not be competitive.

The SWE-bench comparison is the most meaningful direct comparison available. K2.5's 17.6-point lead is significant in absolute terms - roughly one in six additional issues resolved. But GLM achieves its 59.2% with 10.7x fewer active parameters. If you normalize by compute, GLM is extracting far more SWE-bench performance per FLOP. For teams that need good-enough coding assistance rather than the absolute best, GLM's efficiency-adjusted performance is remarkable. For a comprehensive view of how coding models rank, see our coding benchmarks leaderboard.

Pricing Analysis

Cost Factor	Kimi K2.5	GLM-4.7-Flash
API Input (per 1M tokens)	$0.60	Free
API Output (per 1M tokens)	$3.00	Free
Self-host VRAM	Multi-node cluster	~8-12 GB (FP16/INT8)
Self-host Hardware	Enterprise GPU cluster	Single RTX 4090 or equivalent
License	Modified MIT	MIT
Marginal Inference Cost	$0.60-$3.00/1M tokens	Zero

This is the starkest pricing contrast in any comparison I have written. GLM-4.7-Flash's free API from Z.AI means your inference budget is literally zero. Process a billion tokens and you have paid nothing. Self-host it and your only cost is a single GPU and electricity. K2.5's API runs $0.60 input and $3.00 output per million tokens - reasonable for frontier quality but infinitely more expensive than free.

For a team processing 10 million tokens daily, K2.5 costs roughly $36 per day or $1,080 per month. GLM costs nothing. Over a year, that is nearly $13,000 saved - enough to buy the RTX 4090 you would use to self-host GLM several times over. The question is whether the 17.6-point SWE-bench gap and the broader capability advantages of K2.5 justify that spending. For many coding-focused applications, they might not. For a broader comparison of model economics, see our cost efficiency leaderboard.

Kimi K2.5: Pros and Cons

Pros:

SWE-bench Verified 76.8% - among the best in any weight class
AIME 2025 96.1 and GPQA Diamond 87.6 - frontier reasoning
MoonViT-3D vision with native resolution images and video
Agent Swarm with up to 100 sub-agents for complex orchestration
256K context window handles long documents and codebases
Terminal Bench 2.0 shows system administration capability
Modified MIT license for commercial use

Cons:

$3.00/1M output tokens versus free for GLM
1T parameters requires enterprise cluster infrastructure
Not self-hostable for individuals or small teams
API limited to Moonshot's infrastructure
Overkill for straightforward coding assistance tasks

GLM-4.7-Flash: Pros and Cons

Pros:

Completely free API through Z.AI - zero cost at any volume
SWE-bench Verified 59.2% from only ~3B active parameters
tau2-Bench 79.5% demonstrates strong agentic task completion
Runs on a single RTX 4090 for self-hosting
MIT license - fully permissive, no restrictions
128K context window covers most practical use cases
Ideal for budget-constrained developers and startups

Cons:

No published scores on GPQA, MMLU-Pro, AIME, or other broad benchmarks
No vision or multimodal capability
33x smaller total parameter count limits knowledge breadth
No multi-agent orchestration features
Smaller international community and ecosystem
Free API may have rate limits or availability constraints

Verdict

Choose Kimi K2.5 if you are building production systems where AI quality directly impacts outcomes and you can justify the cost. Complex multi-step coding workflows, scientific reasoning, multimodal document processing, and agentic orchestration all benefit from K2.5's breadth and depth. The 17.6-point SWE-bench lead matters when you need the highest possible resolution rate on real codebases. See our open-source LLM leaderboard for the full ranking.

Choose GLM-4.7-Flash if you need a capable coding and agentic model at zero cost. Individual developers, students, open-source projects, startups pre-revenue, and teams in cost-sensitive markets all benefit from a model that delivers 59.2% SWE-bench performance for free. The tau2-Bench score of 79.5% means it handles multi-step agentic tasks well enough for most development workflows. For practical guidance on setting up a free AI coding environment, see our free AI coding setup guide.

The bottom line: These two models represent opposite strategies for making AI accessible. K2.5 does it through raw capability - a model so capable it competes with proprietary offerings at a fraction of their price. GLM-4.7-Flash does it through extreme efficiency and zero cost - a model that gives every developer access to real coding assistance without a credit card. Both approaches are valid, and the best choice depends on whether your constraint is quality or budget.

Kimi K2.5 vs GLM-4.7-Flash: China's Open-Source Heavyweight vs the Free API Underdog

Quick Comparison

Kimi K2.5: The Open-Weight Ceiling

GLM-4.7-Flash: Free and Surprisingly Capable

Benchmark Comparison

Pricing Analysis

Kimi K2.5: Pros and Cons

GLM-4.7-Flash: Pros and Cons

Verdict

Sources

Quick Comparison

Kimi K2.5: The Open-Weight Ceiling

GLM-4.7-Flash: Free and Surprisingly Capable

Benchmark Comparison

Pricing Analysis

Kimi K2.5: Pros and Cons

GLM-4.7-Flash: Pros and Cons

Verdict

Sources

Google Analytics