North Mini Code

Cohere's first developer-focused model - 30B sparse MoE with 3B active parameters, free Apache 2.0 license, 256K context window, and 33.4 on the AA Coding Index.

North Mini Code

North Mini Code is Cohere's first model built for developers rather than enterprise API customers. Released June 9, 2026, it's a 30-billion-parameter sparse mixture-of-experts architecture with just 3B parameters active per token - designed specifically for agentic software engineering tasks. It fits on a single H100 at FP8 precision and ships under an Apache 2.0 license with free API access.

TL;DR

  • Coding-specialist MoE that beats 120B+ dense models on the AA Coding Index (33.4) while activating only 3B parameters per inference step
  • 256K context, 64K max output, free via Cohere API and OpenRouter, Apache 2.0 weights on HuggingFace
  • Trades evenly with Devstral Small 2 on SWE-Bench (67.6% vs ~68% pass@1) at 2.8x higher throughput

The model is the first entry in Cohere's new "North" family - a line aimed at developers who need to self-host or run coding agents on their own infrastructure. That's a different audience than Cohere Command A+, which targets enterprise API customers with retrieval and citation workflows. North Mini Code is built for terminal agents, code reviews, and multi-step software engineering workflows.

The efficiency story is the main talking point. At 3B active parameters, it matches or beats models activating 10-20x as many parameters on coding benchmarks. That ratio matters if you're running many agent instances in parallel or optimizing for cost per completed task.

Key Specifications

SpecificationDetails
ProviderCohere
Model FamilyNorth
Total Parameters30B
Active Parameters3B (per inference step)
Expert Count128 experts, 8 activated per token
Context Window256K tokens
Max Output64K tokens
Input PriceFree
Output PriceFree
Release DateJune 9, 2026
LicenseApache 2.0
QuantizationsBF16, FP8, W4A16
Min Hardware1x H100 80GB (FP8) or 2x A100 40GB (BF16)

The architecture is a decoder-only Transformer with a sigmoid-routed MoE feed-forward. Attention layers use an interleaved 3:1 ratio of sliding-window local attention (with RoPE) to full global attention - a design that keeps long-context costs reasonable without sacrificing quality on longer sequences.

Code on a developer monitor with syntax highlighting The model targets terminal-based agentic tasks and multi-file software engineering workflows. Source: unsplash.com

Benchmark Performance

Cohere tested North Mini Code against open-weight models of similar and larger size. The clearest performance story is on the Artificial Analysis Coding Index, where the model reaches 33.4 - above Qwen3.5 35B-A3B, Gemma 4 26B-A4B, Devstral Small 2 (24B dense), and larger models like Nemotron 3 Super (120B-A12B). Qwen 3.6 35B-A3B edges it at 35.2.

BenchmarkNorth Mini CodeDevstral Small 2 (24B)Devstral 2 (123B)Qwen 3.6 35B-A3B
AA Coding Index33.4--35.2
SWE-Bench Verified (pass@1)67.6%~68%72.2%-
SWE-Bench Verified (pass@10)80.2%---
Mini-SWE-Agent (pass@1)61.0%---
AA Intelligence Index21---
Output Speed (Cohere API)~210 t/s~75 t/s--

SWE-Bench Verified is where the real comparison sits. At 67.6% pass@1, North Mini Code is functionally level with Devstral Small 2 (~68%), but it hits that while running 2.8x faster in throughput on the Cohere API. That gap compounds if you're running dozens of parallel agent instances.

The weaker numbers appear on non-coding agentic benchmarks: GDPval-AA at 14%, τ²-Bench Telecom at 37%, and an overall Agentic Index of 21.7. The model was clearly optimized for software engineering tasks, and it doesn't generalize well to other agentic domains. If you need a general-purpose agent, this isn't it. Check our SWE-bench coding agent leaderboard for a broader view of where this sits against proprietary models.

Cohere also flags that the model is verbose - it uses more output tokens to complete evaluations than comparable models. In practice that means higher context consumption per task, which is a relevant tradeoff when running multi-step agents.

Circuit board close-up illustrating the multi-expert routing in MoE models North Mini Code uses 128 experts with 8 activated per token - only a fraction of parameters fire on any given inference. Source: unsplash.com

Key Capabilities

Agentic Software Engineering

North Mini Code was trained specifically on agentic workflows. The two-phase training pipeline used 70K+ verifiable tasks across roughly 5K unique repositories, with Stage 1 data weighted toward code (70%), agentic tool-use (43%), and competitive/scientific programming (27%). Stage 2 added 4.5 billion tokens of agentic and reasoning-driven samples.

That training mix shows up in the model's strongest use cases: sub-agent orchestration, multi-file code reviews, architecture mapping, and terminal-based task execution. It handles structured tool-call outputs via JSON schema and supports interleaved reasoning with tool use.

Hardware Flexibility

The three quantization formats are a meaningful practical feature. FP8 on a single H100 is the target setup for most cloud inference providers. The W4A16 format (4-bit weights, 16-bit activations) cuts memory further for edge or constrained deployments, with Cohere reporting negligible quality loss over BF16. Running the full BF16 weights requires two A100 40GB cards.

For anyone building on top of North Mini Code locally, vLLM has native support for the 128-expert routing, so there's no need for custom inference code.

Tool-Use and Multi-Turn Reasoning

The model supports structured tool-call outputs and multi-step reasoning chains. In the training data, agentic tool-use accounted for a large share of Stage 1, which means the model has been explicitly trained to reason across tool invocations rather than treating each call as isolated. That makes it better suited to orchestration roles where an agent needs to plan across several steps before calling any individual tool.

Pricing and Availability

North Mini Code is free. Cohere has published no per-token pricing - access via the Cohere API is free at launch, and the model is also available on OpenRouter under the slug cohere/north-mini-code:free. That won't last forever, but it makes the model easy to evaluate without budget risk.

Weights are on HuggingFace at CohereLabs/North-Mini-Code-1.0 in three formats:

  • BF16: Full precision, 2x A100 40GB minimum
  • FP8: Halved memory, 1x H100 80GB, recommended for most deployments
  • W4A16: Further compressed, lowest hardware requirement

The Apache 2.0 license permits commercial use without restrictions. For comparison, Devstral 2's 123B version uses a modified MIT license; North Mini Code's clean Apache 2.0 is cleaner for enterprise legal reviews.

"North Mini Code was built to run where you need it - on your infrastructure, under your control." - Cohere, launch post

See our open-source LLM leaderboard and coding benchmarks leaderboard for updated rankings across the open-weight coding model space.

Strengths and Weaknesses

Strengths

  • Beats models 4-5x its active parameter size on the AA Coding Index
  • 2.8x faster throughput than Devstral Small 2 under identical hardware
  • Clean Apache 2.0 license, free API access at launch
  • Runs on a single H100 at FP8 - accessible hardware for self-hosting
  • Trained on 70K+ verifiable coding tasks for genuine agentic capability

Weaknesses

  • Non-coding agentic benchmarks are weak (Agentic Index: 21.7, GDPval-AA: 14%)
  • Text-only - no vision input
  • Verbose: uses more output tokens per task than comparable models
  • Qwen 3.6 35B-A3B still edges it on the AA Coding Index (35.2 vs 33.4)
  • No disclosed pricing post-launch; free tier may not persist

Sources

✓ Last verified June 25, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.