Name: Ministral 3 8B
Author: Mistral AI

Ministral 3 8B is Mistral AI's mid-range open-weight model in the Ministral family, released December 2, 2025 as part of the Mistral 3 suite with the Ministral 3B and a 14B variant. It targets the band between toy-sized models and production-tier flagships - strong enough for real tasks, cheap enough to run at volume, and small enough to deploy on edge hardware.

TL;DR

76.1% MMLU (5-shot) and 66.8% WildBench, competitive with models twice its size
256K context window, Apache 2.0 license, $0.15/M tokens on Mistral's API
Beats Qwen3-VL-8B-Instruct on WildBench (66.8 vs 66.3) and ties on GPQA Diamond at 8B scale

The December 2025 update (versioned ministral-8b-2512) reintroduced the model with a vision encoder, expanded the context window from 128K to 256K tokens, added support for 40+ languages, and re-licensed everything under Apache 2.0. The original October 2024 Ministral 8B was text-only and ran under Mistral's commercial/research license; the v25.12 release is a major architecture upgrade, not just a patch. Weights are published on HuggingFace under mistralai/Ministral-3-8B-Instruct-2512, mistralai/Ministral-3-8B-Base-2512, and mistralai/Ministral-3-8B-Reasoning-2512.

At 8 billion parameters, the model competes directly with Meta's Llama 3.1 8B and Google's Gemma 2 9B. Mistral's internal comparisons show Ministral 3 8B beating Gemma 12B on most evaluations, which - if consistent across independent test suites - puts this above its weight class.

Key Specifications

Specification	Details
Provider	Mistral AI
Model Family	Ministral
Parameters	9B total (8.4B language model + 0.4B vision encoder)
Context Window	256K tokens
Input Price	$0.15/M tokens
Output Price	$0.15/M tokens
Release Date	December 2, 2025 (v25.12)
License	Apache 2.0
Modalities	Text + Image
Languages	40+

Microprocessor chip close-up representing Ministral 3 8B's efficient inference design Ministral 3 8B requires 12GB VRAM in FP8 and under 12GB RAM when quantized, making it deployable on single mid-range consumer GPUs. Source: unsplash.com

Benchmark Performance

The numbers below come from Mistral's HuggingFace model cards for the v25.12 Instruct and Reasoning variants. Where multiple variants are listed on the same card, I'm citing the Reasoning variant for AIME/GPQA (since those scores reflect extended-thinking) and the Instruct variant for Arena Hard, WildBench, and MMLU (which are conversational/knowledge benchmarks).

Benchmark	Ministral 3 8B	Qwen3-VL-8B-Instruct	Qwen3 8B Base
MMLU 5-shot	76.1%	-	76.0%
MMLU Redux 5-shot	79.3%	-	-
Multilingual MMLU	70.6%	-	70.0%
MATH CoT 2-Shot	62.6%	-	57.6%
Arena Hard	50.9%	52.8%	-
WildBench	66.8	66.3	-
AIME25 (Reasoning)	78.7%	79.8%	-
GPQA Diamond (Reasoning)	66.8%	67.1%	-
LiveCodeBench (Reasoning)	61.6%	58.0%	-
MM-MT-Bench	8.08	8.00	-

The MMLU result of 76.1% is strong for a 8B model - it sits above the 70.7% from the 3B sibling and is competitive with models in the 12-14B range. The LiveCodeBench score of 61.6% from the Reasoning variant is where the model stands out most clearly: it beats the Qwen3-VL-8B-Thinking by 3.6 percentage points on coding, despite otherwise close competition on AIME and GPQA.

Arena Hard at 50.9% is the weakest result in the table. This benchmark rewards conversational fluency and instruction-following quality in head-to-head human preference comparisons, and 50.9% means the model wins roughly half of those matchups - respectable but not dominant. For coding benchmarks or agentic tasks it does better; for open-ended chat, larger models maintain a noticeable edge.

The GPQA Diamond score of 66.8% in extended-thinking mode is a solid result. GPQA tests PhD-level science reasoning; random chance is 25%, and frontier 70B+ models normally land in the 65-75% range. Matching that range with a 8B reasoning variant under extended thinking is notable, even if the score drops substantially without chain-of-thought.

Key Capabilities

Ministral 3 8B ships in three distinct variants - base, instruct, and reasoning - all with image understanding. The instruct variant handles function calling, structured JSON output, multi-turn conversation, and system prompt adherence. The reasoning variant enables extended thinking for math, coding, and STEM tasks where inference-time compute can substitute for raw parameter count.

The architecture uses interleaved sliding-window attention: one full-attention layer for every three sliding-window layers. That pattern keeps memory footprint low during long-context inference while preserving the ability to reason over distant tokens. The original October 2024 Ministral 8B was first to implement this pattern in the 8B weight class; the v25.12 release carries it forward with a 256K context window using RoPE with YaRN for the extended range. For production agentic stacks, this matters - see our cost efficiency leaderboard for how Ministral 8B performs on per-token spend across provider tiers.

Edge computing hardware representing on-device AI deployment The Ministral 3 8B fits in under 12GB RAM when quantized, enabling deployment on edge hardware without cloud round-trips. Source: unsplash.com

The 0.4B vision encoder (shared architecture with Mistral Small 3.2's multimodal stack) handles image captioning, OCR with bounding box extraction, and document Q&A. Multilingual support across 40+ languages makes it useful for on-device translation workloads where the previous generation only covered 11 languages.

Deployment runs on vLLM (v0.12.0+) with --tokenizer_mode mistral. Local users can run quantized GGUF versions via LM Studio or Ollama from the mistralai/Ministral-3-8B-Instruct-2512-GGUF repository.

Pricing and Availability

Pricing on Mistral's La Plateforme API is $0.15/M tokens for both input and output via the ministral-8b-latest endpoint - flat symmetric pricing with no separate input/output split. The original October 2024 endpoint (ministral-8b-2410) priced at $0.10/M; the v25.12 endpoint's $0.15/M reflects the added multimodal and reasoning capability.

The model is available on Mistral AI Studio, Amazon Bedrock, Azure Foundry, OpenRouter, Fireworks, IBM WatsonX, Modal, Together AI, and Unsloth AI. On OpenRouter, pricing is also $0.15/M tokens with a 262K context window. Third-party inference providers offer the original 2410 weights at lower rates (as low as $0.07/M on some providers), but those weights predate the vision encoder and 256K context.

La Plateforme free tier provides rate-limited access for prototyping. The Apache 2.0 license on v25.12 weights permits commercial deployment without royalties. Our small language model leaderboard tracks where Ministral 8B sits against the broader sub-10B field.

Strengths and Weaknesses

Strengths

Strong MMLU and coding scores for the 8B weight class (76.1% MMLU, 61.6% LiveCodeBench with reasoning)
256K context window - most sub-10B models cap at 32K or 128K
Three variants (base/instruct/reasoning) covering different inference tradeoffs
Apache 2.0 license with no commercial restrictions
40+ language support vs. 11 in the original release
Symmetric pricing ($0.15/M in, $0.15/M out) simplifies cost modeling
Fits in under 12GB RAM/VRAM when quantized

Weaknesses

Arena Hard at 50.9% shows weaker conversational quality relative to coding/reasoning scores
$0.15/M is higher than the 3B sibling at $0.15/M - same price, different capability tier (the 3B's legacy endpoint still holds at $0.04/M)
GPQA scores drop substantially without extended thinking; the reasoning variant adds inference latency
No Chatbot Arena Elo independent rating available at time of writing
Original 2410 weights (still in some provider catalogs) lack vision and carry a non-Apache license

Ministral 3B - the smaller sibling with $0.04/M legacy pricing and edge-optimized design
Mistral Large 3 - Mistral's 675B MoE flagship from the same December 2025 release
Mistral Small 3.2 - the 24B dense mid-tier option
Small Language Model Leaderboard - full sub-10B rankings
Cost Efficiency Leaderboard - per-provider pricing comparisons
Coding Benchmarks Leaderboard - LiveCodeBench and SWE-Bench rankings

FAQ

Is Ministral 3 8B free to use commercially?

Yes. The v25.12 weights are Apache 2.0, permitting commercial use without royalties or restrictions. The original October 2024 weights required a Mistral commercial license.

What VRAM does Ministral 3 8B need?

12GB in FP8 format on a single GPU. Under 12GB RAM or VRAM when running quantized GGUF versions via LM Studio or Ollama.

How does Ministral 3 8B differ from the original Ministral 8B?

The v25.12 release adds a 0.4B vision encoder, expands the context window from 128K to 256K tokens, extends language support from 11 to 40+, adds a reasoning variant for extended thinking, and re-licenses the weights under Apache 2.0.

What is the difference between the instruct and reasoning variants?

The instruct variant handles function calling, JSON output, and multi-turn chat at standard latency. The reasoning variant enables extended chain-of-thought for math and coding tasks, trading higher latency for improved accuracy on hard STEM benchmarks.

How does pricing compare to Llama 3.1 8B?

Ministral 3 8B costs $0.15/M tokens on Mistral's API. Llama 3.1 8B is available at lower rates from open-source inference providers. For production use, cost depends on provider, quantization level, and volume.

Sources: