North Mini Code
Cohere's first developer-focused model - 30B sparse MoE with 3B active parameters, free Apache 2.0 license, 256K context window, and 33.4 on the AA Coding Index.

North Mini Code is Cohere's first model built for developers rather than enterprise API customers. Released June 9, 2026, it's a 30-billion-parameter sparse mixture-of-experts architecture with just 3B parameters active per token - designed specifically for agentic software engineering tasks. It fits on a single H100 at FP8 precision and ships under an Apache 2.0 license with free API access.
TL;DR
- Coding-specialist MoE that beats 120B+ dense models on the AA Coding Index (33.4) while activating only 3B parameters per inference step
- 256K context, 64K max output, free via Cohere API and OpenRouter, Apache 2.0 weights on HuggingFace
- Trades evenly with Devstral Small 2 on SWE-Bench (67.6% vs ~68% pass@1) at 2.8x higher throughput
The model is the first entry in Cohere's new "North" family - a line aimed at developers who need to self-host or run coding agents on their own infrastructure. That's a different audience than Cohere Command A+, which targets enterprise API customers with retrieval and citation workflows. North Mini Code is built for terminal agents, code reviews, and multi-step software engineering workflows.
The efficiency story is the main talking point. At 3B active parameters, it matches or beats models activating 10-20x as many parameters on coding benchmarks. That ratio matters if you're running many agent instances in parallel or optimizing for cost per completed task.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Cohere |
| Model Family | North |
| Total Parameters | 30B |
| Active Parameters | 3B (per inference step) |
| Expert Count | 128 experts, 8 activated per token |
| Context Window | 256K tokens |
| Max Output | 64K tokens |
| Input Price | Free |
| Output Price | Free |
| Release Date | June 9, 2026 |
| License | Apache 2.0 |
| Quantizations | BF16, FP8, W4A16 |
| Min Hardware | 1x H100 80GB (FP8) or 2x A100 40GB (BF16) |
The architecture is a decoder-only Transformer with a sigmoid-routed MoE feed-forward. Attention layers use an interleaved 3:1 ratio of sliding-window local attention (with RoPE) to full global attention - a design that keeps long-context costs reasonable without sacrificing quality on longer sequences.
The model targets terminal-based agentic tasks and multi-file software engineering workflows.
Source: unsplash.com
Benchmark Performance
Cohere tested North Mini Code against open-weight models of similar and larger size. The clearest performance story is on the Artificial Analysis Coding Index, where the model reaches 33.4 - above Qwen3.5 35B-A3B, Gemma 4 26B-A4B, Devstral Small 2 (24B dense), and larger models like Nemotron 3 Super (120B-A12B). Qwen 3.6 35B-A3B edges it at 35.2.
| Benchmark | North Mini Code | Devstral Small 2 (24B) | Devstral 2 (123B) | Qwen 3.6 35B-A3B |
|---|---|---|---|---|
| AA Coding Index | 33.4 | - | - | 35.2 |
| SWE-Bench Verified (pass@1) | 67.6% | ~68% | 72.2% | - |
| SWE-Bench Verified (pass@10) | 80.2% | - | - | - |
| Mini-SWE-Agent (pass@1) | 61.0% | - | - | - |
| AA Intelligence Index | 21 | - | - | - |
| Output Speed (Cohere API) | ~210 t/s | ~75 t/s | - | - |
SWE-Bench Verified is where the real comparison sits. At 67.6% pass@1, North Mini Code is functionally level with Devstral Small 2 (~68%), but it hits that while running 2.8x faster in throughput on the Cohere API. That gap compounds if you're running dozens of parallel agent instances.
The weaker numbers appear on non-coding agentic benchmarks: GDPval-AA at 14%, τ²-Bench Telecom at 37%, and an overall Agentic Index of 21.7. The model was clearly optimized for software engineering tasks, and it doesn't generalize well to other agentic domains. If you need a general-purpose agent, this isn't it. Check our SWE-bench coding agent leaderboard for a broader view of where this sits against proprietary models.
Cohere also flags that the model is verbose - it uses more output tokens to complete evaluations than comparable models. In practice that means higher context consumption per task, which is a relevant tradeoff when running multi-step agents.
North Mini Code uses 128 experts with 8 activated per token - only a fraction of parameters fire on any given inference.
Source: unsplash.com
Key Capabilities
Agentic Software Engineering
North Mini Code was trained specifically on agentic workflows. The two-phase training pipeline used 70K+ verifiable tasks across roughly 5K unique repositories, with Stage 1 data weighted toward code (70%), agentic tool-use (43%), and competitive/scientific programming (27%). Stage 2 added 4.5 billion tokens of agentic and reasoning-driven samples.
That training mix shows up in the model's strongest use cases: sub-agent orchestration, multi-file code reviews, architecture mapping, and terminal-based task execution. It handles structured tool-call outputs via JSON schema and supports interleaved reasoning with tool use.
Hardware Flexibility
The three quantization formats are a meaningful practical feature. FP8 on a single H100 is the target setup for most cloud inference providers. The W4A16 format (4-bit weights, 16-bit activations) cuts memory further for edge or constrained deployments, with Cohere reporting negligible quality loss over BF16. Running the full BF16 weights requires two A100 40GB cards.
For anyone building on top of North Mini Code locally, vLLM has native support for the 128-expert routing, so there's no need for custom inference code.
Tool-Use and Multi-Turn Reasoning
The model supports structured tool-call outputs and multi-step reasoning chains. In the training data, agentic tool-use accounted for a large share of Stage 1, which means the model has been explicitly trained to reason across tool invocations rather than treating each call as isolated. That makes it better suited to orchestration roles where an agent needs to plan across several steps before calling any individual tool.
Pricing and Availability
North Mini Code is free. Cohere has published no per-token pricing - access via the Cohere API is free at launch, and the model is also available on OpenRouter under the slug cohere/north-mini-code:free. That won't last forever, but it makes the model easy to evaluate without budget risk.
Weights are on HuggingFace at CohereLabs/North-Mini-Code-1.0 in three formats:
- BF16: Full precision, 2x A100 40GB minimum
- FP8: Halved memory, 1x H100 80GB, recommended for most deployments
- W4A16: Further compressed, lowest hardware requirement
The Apache 2.0 license permits commercial use without restrictions. For comparison, Devstral 2's 123B version uses a modified MIT license; North Mini Code's clean Apache 2.0 is cleaner for enterprise legal reviews.
"North Mini Code was built to run where you need it - on your infrastructure, under your control." - Cohere, launch post
See our open-source LLM leaderboard and coding benchmarks leaderboard for updated rankings across the open-weight coding model space.
Strengths and Weaknesses
Strengths
- Beats models 4-5x its active parameter size on the AA Coding Index
- 2.8x faster throughput than Devstral Small 2 under identical hardware
- Clean Apache 2.0 license, free API access at launch
- Runs on a single H100 at FP8 - accessible hardware for self-hosting
- Trained on 70K+ verifiable coding tasks for genuine agentic capability
Weaknesses
- Non-coding agentic benchmarks are weak (Agentic Index: 21.7, GDPval-AA: 14%)
- Text-only - no vision input
- Verbose: uses more output tokens per task than comparable models
- Qwen 3.6 35B-A3B still edges it on the AA Coding Index (35.2 vs 33.4)
- No disclosed pricing post-launch; free tier may not persist
Related Coverage
- Devstral 2 model profile - Mistral's competing agentic coding model
- Cohere Command A+ model profile - Cohere's general-purpose MoE flagship
- SWE-bench Coding Agent Leaderboard
- Coding Benchmarks Leaderboard
- Open-Source LLM Leaderboard
Sources
- North Mini Code: Agentic Coding Model for Developers - Cohere
- Introducing North Mini Code: Cohere's First Model For Developers - HuggingFace Blog
- CohereLabs/North-Mini-Code-1.0 - HuggingFace
- North Mini Code: Performance and Provider Benchmarks - Artificial Analysis
- North Mini Code: API Provider Analysis - Artificial Analysis
- Cohere North Mini Code: A Small Coding-Focused MoE Model - Artificial Analysis
- Meet North Mini Code: Cohere's 30B Open-Weight MoE - MarkTechPost
- North Mini Code and Agentic Coding Benchmarks - Sebastian Raschka
- North Mini Code (free) - OpenRouter
- VentureBeat - Cohere open-sources a coding agent
✓ Last verified June 25, 2026
