Ministral 3B

Mistral AI's smallest open-weight model - 3B parameters, 256K context, Apache 2.0 license, built for edge and cost-sensitive deployments.

Ministral 3B

Ministral 3B is Mistral AI's entry-level open-weight language model, targeting applications where cloud round-trips are too slow, too expensive, or simply not allowed. It sits at the bottom of the Ministral family - sharing architecture decisions with the 8B and 14B variants but fitting comfortably on hardware with 4 GB of VRAM when quantized to 4-bit.

TL;DR

  • Strongest coding and reasoning scores in the sub-4B class, per Mistral's internal benchmarks, though independent tests show a tighter race with Llama 3.2 3B
  • 256K context window and native function calling at $0.04/M tokens input and output
  • Apache 2.0 license means no commercial use restrictions; runs on a single consumer GPU or quantized to a smartphone

Mistral first released the 3B model in October 2024 under a commercial license, targeting on-device translation, offline assistants, and agentic orchestration pipelines where the 3B sits as a cheap router or classifier in front of a larger model. A revised December 2025 release - versioned as Ministral 3 3B (v25.12) - added native image understanding, expanded the context window to 256K tokens, and re-licensed the weights under Apache 2.0. Pricing on the v25.12 API endpoint is $0.10/M tokens; the original ministral-3b-latest endpoint remains available at $0.04/M.

The model competes directly with Google's Gemma 3 2B and Meta's Llama 3.2 3B. At 3 billion parameters it punches above its weight on structured-output and function-calling tasks, which matters more than headline MMLU scores for production agentic workflows. Our edge and mobile LLM leaderboard tracks the sub-10B field in detail if you want a broader view of the competitive landscape.

Key Specifications

SpecificationDetails
ProviderMistral AI
Model FamilyMinistral
Parameters3B (3.4B language model + 0.4B vision encoder)
Context Window256K tokens
Input Price$0.04/M tokens (ministral-3b-latest); $0.10/M tokens (v25.12 API)
Output Price$0.04/M tokens (ministral-3b-latest); $0.10/M tokens (v25.12 API)
Release DateOctober 16, 2024 (original); December 2, 2025 (v25.12)
LicenseApache 2.0 (v25.12); Mistral Research/Commercial License (original)
ModalitiesText + Image (v25.12)
Supported Languages11+ (English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic)

Microchip processor close-up representing Ministral 3B's edge-focused design Ministral 3B targets hardware-constrained environments, fitting in 8 GB VRAM at FP8 and under 4 GB with 4-bit quantization. Source: pexels.com

Benchmark Performance

The numbers below come from Mistral's HuggingFace model card for the v25.12 Instruct variant. I'm including independent data from Artificial Analysis where it differs, because the two datasets don't fully agree.

BenchmarkMinistral 3BLlama 3.2 3BGemma 2 2B
MMLU (5-shot)70.7%63.4%51.3%
MMLU Redux (5-shot)73.5%--
Multilingual MMLU65.2%--
MATH (CoT 2-shot)60.1%--
GPQA Diamond53.4%--
LiveCodeBench54.8%--
Arena Hard30.5%--
WildBench56.8--
MM-MTBench7.83--

The MMLU numbers are strong for the weight class - 70.7% sits well above Llama 3.2 3B's 63.4% and Gemma 2 2B's 51.3%. Mistral's internal benchmarks in the October 2024 announcement made a broader claim: that Ministral 3B outperforms Mistral 7B (released a year earlier) on most benchmarks. Artificial Analysis data, which runs its own test suite, puts Ministral 3B at an intelligence index of 11 out of 29 assessed open-weight models in the sub-10B class, with output speeds of ~185 tokens/second - third-fastest in the category.

The GPQA Diamond score of 53.4% on the Reasoning variant is an outlier worth flagging. GPQA is a PhD-level science benchmark; random chance is roughly 25% and most frontier 70B+ models land in the 50-65% range. Getting 53.4% from a 3B model is a meaningful result, though the Reasoning variant uses chain-of-thought and isn't directly comparable to the base instruct model. For agentic coding work, see our coding benchmarks leaderboard for how these models stack up on real engineering tasks.

Key Capabilities

Ministral 3B's clearest strength is structured output - function calling, JSON mode, and multi-step agentic pipelines. The October 2024 release was purpose-built for agentic workflows where a small model acts as input parser, task router, or application caller in front of a larger model like Mistral Large 3. That design intent shows up in the benchmark data: LiveCodeBench at 54.8% is notably high for a 3B model.

The v25.12 update added a vision encoder (0.4B parameters, derived from the Mistral Small 3.1 architecture) that handles image captioning, OCR with bounding box extraction, and document Q&A. It's multimodal inference at a price point that was previously impossible at this scale. The 256K context window also matters: most sub-10B models cap at 8K or 32K tokens, making long-document tasks impossible without chunking.

Smartphone representing on-device edge AI inference capabilities Ministral 3B can run quantized on-device, enabling offline assistants and private inference without cloud dependencies. Source: pexels.com

Multilingual support across 11 languages positions the model for real-time translation workloads without internet access. Customers cited on-device translation, internet-less smart assistants, local analytics, and autonomous robotics as their primary pull requests for the 3B weight class when Mistral was scoping the release. The architecture uses Grouped Query Attention (GQA) with 32 query heads and 8 KV heads, RoPE with YaRN for the extended context window, SwiGLU activation, and RMSNorm - all choices that trade off raw parameter density for inference efficiency.

Deployment is straightforward. The model runs on vLLM (v0.12.0+) via mistral_common tokenizer mode. For local use, LM Studio and Ollama both carry GGUF versions. Hardware requirement is 8 GB VRAM at FP8; less with quantization.

Pricing and Availability

The pricing story here is a two-version situation that's worth understanding clearly. The original ministral-3b-latest endpoint charges $0.04/M tokens for both input and output - making it one of the cheapest hosted inference options available. The v25.12 API (ministral-3-3b-25-12) prices at $0.10/M tokens, which reflects the added multimodal capability.

For cost-sensitive batch workloads, Mistral's La Plateforme offers a free tier with rate-limited access for prototyping. Production use is pay-as-you-go with no monthly minimum. Third-party inference providers including OpenRouter, Fireworks, Amazon Bedrock (available December 2025), IBM WatsonX, and Modal all carry the model - see our cost efficiency leaderboard for current per-provider pricing comparisons.

The Apache 2.0 license on v25.12 weights is notable: no usage restrictions for commercial applications, no royalty requirements, and free redistribution. Weights are on HuggingFace under mistralai/Ministral-3-3B-Instruct-2512, mistralai/Ministral-3-3B-Base-2512, and mistralai/Ministral-3-3B-Reasoning-2512. GGUF quantized versions are available under mistralai/Ministral-3-3B-Instruct-2512-GGUF.

Strengths and Weaknesses

Strengths

  • Best-in-class MMLU for a sub-4B model (70.7% vs Llama 3.2 3B's 63.4%)
  • $0.04/M token pricing at the ministral-3b-latest endpoint is difficult to beat for batch inference
  • 256K context window is rare at this parameter count
  • Apache 2.0 license with no commercial restrictions
  • Strong function calling and structured output for agentic pipeline use
  • Multimodal (text + vision) in the v25.12 release
  • Deployable on 8 GB VRAM; under 4 GB VRAM with 4-bit quantization

Weaknesses

  • Independent benchmarks (Artificial Analysis) rank it as "expensive relative to its class" on the $0.10/M v25.12 endpoint
  • Arena Hard score of 30.5% suggests weaker conversational quality than raw reasoning scores imply
  • Original October 2024 weights carried a non-Apache commercial license - the free weights require v25.12
  • Verbosity: Artificial Analysis recorded 16M tokens generated during evaluation versus a median of 7M across comparable models, meaning real-world costs may run higher than pricing headlines suggest
  • Vision capability is limited relative to dedicated multimodal models

FAQ

Is Ministral 3B free to use commercially?

The v25.12 weights are Apache 2.0, which permits commercial use with no royalties or restrictions. The original October 2024 release required a Mistral commercial license.

What hardware does Ministral 3B need?

It fits in 8 GB VRAM at FP8 precision. With 4-bit quantization via GGUF, it runs on hardware with 4 GB VRAM, covering most consumer GPUs and newer mobile chips.

How does Ministral 3B compare to Llama 3.2 3B?

On MMLU, Ministral 3B scores 70.7% versus Llama 3.2 3B's 63.4%. Ministral also supports a 256K context window versus Llama 3.2 3B's 128K. Llama 3.2 3B carries a broader community of fine-tunes.

Does Ministral 3B support images?

Yes, in the v25.12 release. The model adds a 0.4B vision encoder supporting image captioning, OCR with bounding boxes, and document Q&A.

What is the context window?

256K tokens in the v25.12 release. The original October 2024 model supported 128K tokens.


Sources:

✓ Last verified May 25, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.