Ministral 3 14B

Mistral AI's largest Ministral 3 model - 14B parameters, 256K context, Apache 2.0 license, multimodal, built for local deployment and agentic workflows.

Ministral 3 14B

Ministral 3 14B is the largest model in Mistral AI's Ministral 3 family, released on December 2, 2025 with the 3B and 8B variants. It combines a 13.5B-parameter language core with a 0.4B vision encoder for a total of 14 billion parameters - multimodal, Apache 2.0 licensed, and targeting local deployments where the weight class matters.

TL;DR

  • Best-in-class reasoning at 14B: 85.0% on AIME 2025 and 71.2% on GPQA Diamond using the reasoning variant
  • 256K context window, native function calling, vision support, 40+ languages, Apache 2.0 license
  • Beats Qwen3-14B (73.7% AIME 2025) and Gemma 3 12B on core benchmarks despite having fewer nominal parameters

Mistral released the Ministral 3 series as its answer to edge and private deployment demand. The 14B sits at the top of a three-model stack (3B, 8B, 14B), sharing the same dense architecture, base training curriculum, and Apache 2.0 license. Each size comes in base, instruct, and reasoning variants. Mistral's claim is that the 14B delivers "performance comparable to its larger Mistral Small 3.2 24B counterpart" - a comparison that holds on instruction-following benchmarks but breaks down on some coding tasks where the 24B holds an edge.

The model is 68.6% smaller than Mistral Small 3.2 by parameter count but pushes past it on MMLU Redux (82.0% base vs lower), Arena Hard (55.1%), and AIME 2025 reasoning - while fitting in 24 GB VRAM at FP8, a threshold reachable on a single consumer GPU tier like the RTX 4090. That tradeoff is Ministral 3 14B's core pitch.

Key Specifications

SpecificationDetails
ProviderMistral AI
Model FamilyMinistral
Parameters14B (13.5B language model + 0.4B vision encoder)
ArchitectureDense transformer, GQA (32Q / 8KV heads), SwiGLU, RMSNorm, RoPE + YaRN
Layers40 transformer layers, hidden dim 5120, FFN dim 16384
Context Window256K tokens (262,144 exact)
Input Price$0.20/M tokens
Output Price$0.20/M tokens
Release DateDecember 2, 2025
LicenseApache 2.0
ModalitiesText + image input
Languages40+ (including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic)

Benchmark Performance

All numbers below come from Mistral's official HuggingFace model card for Ministral-3-14B-Instruct-2512 and the arXiv technical report (2601.08584). Reasoning variant scores use the Ministral-3-14B-Reasoning-2512 checkpoint.

Instruct Variant

BenchmarkMinistral 3 14BQwen3 14BGemma 3 12B
Arena Hard55.1%42.7%43.6%
WildBench68.565.163.2
MATH Maj@190.4%87.0%85.4%
MM MTBench8.49N/A6.70

Reasoning Variant

BenchmarkMinistral 3 14B (Reasoning)Qwen3-14B (Thinking)
AIME 202585.0%73.7%
AIME 202489.8%83.7%
GPQA Diamond71.2%66.3%
LiveCodeBench v664.6%59.3%

Base Model

BenchmarkMinistral 3 14B BaseQwen3 14B Base
MMLU Redux82.0%83.7%
Multilingual MMLU74.2%75.4%
MATH CoT67.6%62.0%
ARC-Challenge89.9%N/A
TriviaQA74.9%70.3%
MBPP71.6%N/A
GPQA Diamond (base)39.9%N/A

The instruct model leads Qwen3-14B on every disclosed benchmark. The base model comparison is closer - Qwen3-14B Base edges it on MMLU Redux and Multilingual MMLU but trails on math and trivia. Where Ministral 3 14B separates cleanly from the field is the reasoning variant: 85.0% on AIME 2025 is the headline number, beating Qwen3-14B Thinking's 73.7% by 11 points. That gap matters for users who need chain-of-thought depth rather than raw chat quality.

Artificial Analysis places the model at intelligence rank #15 of 73 assessed models in its class (score: 16, vs median 12 for non-reasoning open-weight models of similar size). Output speed at 82.8 tokens/second is below the class median of 97.8 t/s - not a concern for offline batch or agentic workflows, but worth noting for latency-sensitive chat applications. For a broader view of 14B-class reasoning models, see our reasoning benchmarks leaderboard.

Key Capabilities

Vision and Multimodal

The model's 0.4B vision encoder handles image captioning, document OCR with bounding box extraction, chart analysis, and visual question answering. The architecture is the same vision stack used in Mistral Small 3.1 and shares components across the Ministral 3 family. Input resolution is optimized for roughly square aspect ratios (1:1), and the recommended approach for non-square inputs is to maintain aspect ratio rather than force a crop.

Agentic Workflows and Function Calling

Native function calling and structured JSON output are first-class features. Mistral designed the Ministral family with tool use in mind - the instruct checkpoint is tuned for dialogue, tool invocation, and structured output rather than raw pretraining perplexity. The 256K context window means multi-step agentic sessions with long tool result chains don't require manual truncation. That's a different posture than most 14B models, which cap at 32K or 128K tokens. See our function calling benchmarks leaderboard for how the field compares on structured output.

Local and Private Deployment

At 24 GB VRAM in FP8, the model fits a single RTX 4090 or a Mac with 32 GB unified memory. Further quantization with GGUF brings requirements down to 12-16 GB for INT4/INT8 variants available via Ollama (ollama pull ministral-3:14b) and LM Studio. The Apache 2.0 license puts no commercial restrictions on local deployment - no royalty requirements, no usage reports required.

Token Efficiency

Mistral's technical report notes that the Ministral 3 family "often produces an order of magnitude fewer tokens" than competing models while matching performance. The 14B instruct variant's WildBench score of 68.5 with a lower average output length than Qwen or Gemma equivalents matters directly for API cost in production workloads.

Pricing and Availability

At $0.20/M tokens for both input and output, Ministral 3 14B isn't the cheapest option in its size class. Artificial Analysis ranks it #60 of 73 models on input pricing, meaning most 14B-class models are cheaper via third-party providers. The official Mistral endpoint price is $0.20/M; the model is also available at the same price via OpenRouter.

ProviderInput PriceOutput PriceNotes
Mistral AI (La Plateforme)$0.20/M$0.20/MOfficial API
OpenRouter$0.20/M$0.20/MMulti-provider routing
Amazon BedrockVariesVariesAvailable since Dec 2025
IBM WatsonXVariesVariesEnterprise tier
Together AIVariesVariesAvailable
FireworksVariesVariesAvailable
Ollama / LM StudioFreeFreeSelf-hosted GGUF

For cost-sensitive applications, the Ministral 3B at $0.04/M or the 8B variant at lower pricing are the better picks. The 14B earns its price when reasoning depth, vision, or long-context (>128K tokens) is required. Our cost efficiency leaderboard tracks per-provider pricing comparisons across the full Mistral lineup.

Strengths and Weaknesses

Strengths

  • Reasoning variant hits 85.0% on AIME 2025, best-in-class at 14B
  • 256K context window - 2x larger than Mistral Small 3.2's 128K
  • Apache 2.0 with no commercial restrictions
  • Multimodal (vision + text) in the base release
  • 40+ language support including strong multilingual MMLU
  • Fits in 24 GB VRAM (FP8) - single consumer GPU tier
  • Native function calling and JSON output for agentic use
  • Three variants (base, instruct, reasoning) in one model family

Weaknesses

  • $0.20/M pricing is expensive relative to comparable 14B-class open-weight options
  • Output speed (82.8 t/s) is below average for the class - slower than Qwen3-14B at comparable quality
  • Arena Hard score of 55.1% trails Mistral Large 3 notably
  • Vision capability is limited relative to dedicated multimodal models
  • Requires 24 GB VRAM at FP8 - rules out mid-tier consumer GPUs (8-16 GB)
  • HuggingFace model card recommends temperature 0.1 for production use, limiting creative output quality

FAQ

Is Ministral 3 14B open source?

Yes. The weights are released under Apache 2.0, which permits commercial use, redistribution, and modification without royalties. Available at mistralai/Ministral-3-14B-Instruct-2512 on HuggingFace.

What hardware does Ministral 3 14B need?

24 GB VRAM at FP8. With INT8 or INT4 GGUF quantization, requirements drop to 12-16 GB, covering the RTX 3090 and similar. Self-hosting via Ollama or LM Studio is practical on a single high-end consumer GPU.

How does it compare to Mistral Small 3.2 24B?

Ministral 3 14B matches or beats Small 3.2 on instruction following and math benchmarks (per llm-stats.com), while offering 2x the context window (256K vs 128K). Small 3.2 holds an edge on coding tasks with a higher coding index.

What is the difference between the instruct and reasoning variants?

The instruct variant (Ministral-3-14B-Instruct-2512) is tuned for dialogue, tool use, and structured outputs. The reasoning variant (Ministral-3-14B-Reasoning-2512) adds chain-of-thought post-training, reaching 85.0% on AIME 2025 at the cost of longer, more verbose outputs.

Does it support images?

Yes. The model includes a 0.4B vision encoder for image captioning, document OCR with bounding boxes, and visual question answering. Use square-ish input images for best results per Mistral's documentation.


Sources:

✓ Last verified June 8, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.