Name: Ministral 3 14B
Author: Mistral AI

Ministral 3 14B is the largest model in Mistral AI's Ministral 3 family, released on December 2, 2025 with the 3B and 8B variants. It combines a 13.5B-parameter language core with a 0.4B vision encoder for a total of 14 billion parameters - multimodal, Apache 2.0 licensed, and targeting local deployments where the weight class matters.

TL;DR

Best-in-class reasoning at 14B: 85.0% on AIME 2025 and 71.2% on GPQA Diamond using the reasoning variant
256K context window, native function calling, vision support, 40+ languages, Apache 2.0 license
Beats Qwen3-14B (73.7% AIME 2025) and Gemma 3 12B on core benchmarks despite having fewer nominal parameters

Mistral released the Ministral 3 series as its answer to edge and private deployment demand. The 14B sits at the top of a three-model stack (3B, 8B, 14B), sharing the same dense architecture, base training curriculum, and Apache 2.0 license. Each size comes in base, instruct, and reasoning variants. Mistral's claim is that the 14B delivers "performance comparable to its larger Mistral Small 3.2 24B counterpart" - a comparison that holds on instruction-following benchmarks but breaks down on some coding tasks where the 24B holds an edge.

The model is 68.6% smaller than Mistral Small 3.2 by parameter count but pushes past it on MMLU Redux (82.0% base vs lower), Arena Hard (55.1%), and AIME 2025 reasoning - while fitting in 24 GB VRAM at FP8, a threshold reachable on a single consumer GPU tier like the RTX 4090. That tradeoff is Ministral 3 14B's core pitch.

Key Specifications

Specification	Details
Provider	Mistral AI
Model Family	Ministral
Parameters	14B (13.5B language model + 0.4B vision encoder)
Architecture	Dense transformer, GQA (32Q / 8KV heads), SwiGLU, RMSNorm, RoPE + YaRN
Layers	40 transformer layers, hidden dim 5120, FFN dim 16384
Context Window	256K tokens (262,144 exact)
Input Price	$0.20/M tokens
Output Price	$0.20/M tokens
Release Date	December 2, 2025
License	Apache 2.0
Modalities	Text + image input
Languages	40+ (including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic)

Benchmark Performance

All numbers below come from Mistral's official HuggingFace model card for Ministral-3-14B-Instruct-2512 and the arXiv technical report (2601.08584). Reasoning variant scores use the Ministral-3-14B-Reasoning-2512 checkpoint.

Instruct Variant

Benchmark	Ministral 3 14B	Qwen3 14B	Gemma 3 12B
Arena Hard	55.1%	42.7%	43.6%
WildBench	68.5	65.1	63.2
MATH Maj@1	90.4%	87.0%	85.4%
MM MTBench	8.49	N/A	6.70

Reasoning Variant

Benchmark	Ministral 3 14B (Reasoning)	Qwen3-14B (Thinking)
AIME 2025	85.0%	73.7%
AIME 2024	89.8%	83.7%
GPQA Diamond	71.2%	66.3%
LiveCodeBench v6	64.6%	59.3%

Base Model

Benchmark	Ministral 3 14B Base	Qwen3 14B Base
MMLU Redux	82.0%	83.7%
Multilingual MMLU	74.2%	75.4%
MATH CoT	67.6%	62.0%
ARC-Challenge	89.9%	N/A
TriviaQA	74.9%	70.3%
MBPP	71.6%	N/A
GPQA Diamond (base)	39.9%	N/A

The instruct model leads Qwen3-14B on every disclosed benchmark. The base model comparison is closer - Qwen3-14B Base edges it on MMLU Redux and Multilingual MMLU but trails on math and trivia. Where Ministral 3 14B separates cleanly from the field is the reasoning variant: 85.0% on AIME 2025 is the headline number, beating Qwen3-14B Thinking's 73.7% by 11 points. That gap matters for users who need chain-of-thought depth rather than raw chat quality.

Artificial Analysis places the model at intelligence rank #15 of 73 assessed models in its class (score: 16, vs median 12 for non-reasoning open-weight models of similar size). Output speed at 82.8 tokens/second is below the class median of 97.8 t/s - not a concern for offline batch or agentic workflows, but worth noting for latency-sensitive chat applications. For a broader view of 14B-class reasoning models, see our reasoning benchmarks leaderboard.

Key Capabilities

Vision and Multimodal

The model's 0.4B vision encoder handles image captioning, document OCR with bounding box extraction, chart analysis, and visual question answering. The architecture is the same vision stack used in Mistral Small 3.1 and shares components across the Ministral 3 family. Input resolution is optimized for roughly square aspect ratios (1:1), and the recommended approach for non-square inputs is to maintain aspect ratio rather than force a crop.

Agentic Workflows and Function Calling

Native function calling and structured JSON output are first-class features. Mistral designed the Ministral family with tool use in mind - the instruct checkpoint is tuned for dialogue, tool invocation, and structured output rather than raw pretraining perplexity. The 256K context window means multi-step agentic sessions with long tool result chains don't require manual truncation. That's a different posture than most 14B models, which cap at 32K or 128K tokens. See our function calling benchmarks leaderboard for how the field compares on structured output.

Local and Private Deployment

At 24 GB VRAM in FP8, the model fits a single RTX 4090 or a Mac with 32 GB unified memory. Further quantization with GGUF brings requirements down to 12-16 GB for INT4/INT8 variants available via Ollama (ollama pull ministral-3:14b) and LM Studio. The Apache 2.0 license puts no commercial restrictions on local deployment - no royalty requirements, no usage reports required.

Token Efficiency

Mistral's technical report notes that the Ministral 3 family "often produces an order of magnitude fewer tokens" than competing models while matching performance. The 14B instruct variant's WildBench score of 68.5 with a lower average output length than Qwen or Gemma equivalents matters directly for API cost in production workloads.

Pricing and Availability

At $0.20/M tokens for both input and output, Ministral 3 14B isn't the cheapest option in its size class. Artificial Analysis ranks it #60 of 73 models on input pricing, meaning most 14B-class models are cheaper via third-party providers. The official Mistral endpoint price is $0.20/M; the model is also available at the same price via OpenRouter.

Provider	Input Price	Output Price	Notes
Mistral AI (La Plateforme)	$0.20/M	$0.20/M	Official API
OpenRouter	$0.20/M	$0.20/M	Multi-provider routing
Amazon Bedrock	Varies	Varies	Available since Dec 2025
IBM WatsonX	Varies	Varies	Enterprise tier
Together AI	Varies	Varies	Available
Fireworks	Varies	Varies	Available
Ollama / LM Studio	Free	Free	Self-hosted GGUF

For cost-sensitive applications, the Ministral 3B at $0.04/M or the 8B variant at lower pricing are the better picks. The 14B earns its price when reasoning depth, vision, or long-context (>128K tokens) is required. Our cost efficiency leaderboard tracks per-provider pricing comparisons across the full Mistral lineup.

Strengths and Weaknesses

Strengths

Reasoning variant hits 85.0% on AIME 2025, best-in-class at 14B
256K context window - 2x larger than Mistral Small 3.2's 128K
Apache 2.0 with no commercial restrictions
Multimodal (vision + text) in the base release
40+ language support including strong multilingual MMLU
Fits in 24 GB VRAM (FP8) - single consumer GPU tier
Native function calling and JSON output for agentic use
Three variants (base, instruct, reasoning) in one model family

Weaknesses

$0.20/M pricing is expensive relative to comparable 14B-class open-weight options
Output speed (82.8 t/s) is below average for the class - slower than Qwen3-14B at comparable quality
Arena Hard score of 55.1% trails Mistral Large 3 notably
Vision capability is limited relative to dedicated multimodal models
Requires 24 GB VRAM at FP8 - rules out mid-tier consumer GPUs (8-16 GB)
HuggingFace model card recommends temperature 0.1 for production use, limiting creative output quality

Ministral 3B - The 3B sibling in the same family, at $0.04/M tokens
Mistral Small 3.2 - The 24B model Mistral positions as the comparison target
Mistral Large 3 - Mistral's flagship for full enterprise workloads
Reasoning Benchmarks Leaderboard - Full AIME/GPQA/LiveCodeBench rankings
Function Calling Benchmarks Leaderboard - Structured output and tool use comparisons
Cost Efficiency Leaderboard - Per-provider pricing across all Mistral tiers
Edge and Mobile LLM Leaderboard - Sub-20B models ranked by hardware fit and performance

FAQ

Is Ministral 3 14B open source?

Yes. The weights are released under Apache 2.0, which permits commercial use, redistribution, and modification without royalties. Available at mistralai/Ministral-3-14B-Instruct-2512 on HuggingFace.

What hardware does Ministral 3 14B need?

24 GB VRAM at FP8. With INT8 or INT4 GGUF quantization, requirements drop to 12-16 GB, covering the RTX 3090 and similar. Self-hosting via Ollama or LM Studio is practical on a single high-end consumer GPU.

How does it compare to Mistral Small 3.2 24B?

Ministral 3 14B matches or beats Small 3.2 on instruction following and math benchmarks (per llm-stats.com), while offering 2x the context window (256K vs 128K). Small 3.2 holds an edge on coding tasks with a higher coding index.

What is the difference between the instruct and reasoning variants?

The instruct variant (Ministral-3-14B-Instruct-2512) is tuned for dialogue, tool use, and structured outputs. The reasoning variant (Ministral-3-14B-Reasoning-2512) adds chain-of-thought post-training, reaching 85.0% on AIME 2025 at the cost of longer, more verbose outputs.

Does it support images?

Yes. The model includes a 0.4B vision encoder for image captioning, document OCR with bounding boxes, and visual question answering. Use square-ish input images for best results per Mistral's documentation.

Sources: