Name: Mistral Large 3
Author: Mistral AI

TL;DR

675B total / 41B active granular MoE with 2.5B vision encoder - Apache 2.0 license
#2 open-source non-reasoning model and #6 overall on LMArena (1418 Elo)
Native multimodal (text + images), 256K context window, strong function calling
API at $0.50/$1.50 per million tokens - approximately 80% cheaper than GPT-4o class models
Trained on 3,000 H200 GPUs - Europe's answer to the US/China frontier model race

Overview

Mistral AI released Large 3 on December 2, 2025, and it represents a genuine inflection point for European AI. This is a 675 billion parameter Mixture-of-Experts model that activates 41 billion parameters per forward pass, ships with native vision (a 2.5B parameter encoder fused into the architecture), and is released under Apache 2.0. That license choice matters - Mistral's previous flagship (Large 2) used the more restrictive Mistral Research License. With Apache 2.0, Large 3 can be commercially launched, modified, and redistributed without restrictions.

On the LMArena leaderboard, Large 3 debuted at 1418 Elo - #2 among open-source non-reasoning models, #6 overall in the open-weight category. It beats Llama 3.1 405B and the previous generation of open models by a comfortable margin. On MMMLU (multilingual knowledge), it scores 85.5%. On HumanEval (Python code generation), it posts around 92% pass@1. These are strong numbers for an open model, and the 256K context window is the largest among any open-weight model at this capability tier.

But let's be precise about where Large 3 sits. On GPQA Diamond (43.9%) and SimpleQA (23.8%), it scores substantially below DeepSeek V3.2 (82.4% and 97.1% respectively) and the proprietary frontier. The gap isn't small - it's 38 points on GPQA Diamond versus DeepSeek V3.2 alone. Claude Opus 4.6 at 91.3% is in a different league on PhD-level reasoning. Large 3 is a strong generalist, not a reasoning specialist. Its value proposition is breadth, cost efficiency, multimodal capability, and the Apache 2.0 license - not beating proprietary models on the hardest benchmarks.

Key Specifications

Specification	Details
Provider	Mistral AI
Model Family	Mistral Large
Architecture	Granular Mixture-of-Experts + Vision Encoder
Total Parameters	675B (673B language model + 2.5B vision encoder)
Active Parameters	41B per token (39B language + vision encoder)
Context Window	256,000 tokens
Input Price	$0.50/M tokens
Output Price	$1.50/M tokens
Release Date	December 2, 2025
License	Apache 2.0
Input Modalities	Text, Images (up to 8 per prompt)
Output Modality	Text
Quantization	FP8 (H200/B200), NVFP4 (H100/A100)
Recommended Infra	8xH200 node (FP8) or H100/A100 node (NVFP4)
Model ID	`mistral-large-latest`

Benchmark Performance

Benchmark	Mistral Large 3	DeepSeek V3.2	Claude Opus 4.6	Llama 3.1 405B
MMMLU (multilingual knowledge)	85.5%	84.2%	91.1%	73.1%
GPQA Diamond (PhD-level science)	43.9%	82.4%	91.3%	50.7%
HumanEval (Python, pass@1)	92.0%	90.5%	93.2%	89.0%
LiveCodeBench	34.4%	83.3%	78.5%	28.9%
SimpleQA (factual accuracy)	23.8%	97.1%	82.0%	19.6%
LMArena Elo (human preference)	1418	1456	1496	1352
Context Window	256K	128K	1M (beta)	128K

Two things jump out from this table. First, Mistral Large 3 convincingly beats Llama 3.1 405B across every metric - it is the clear open-weight generalist leader in that comparison. Second, the gap to DeepSeek V3.2 and the proprietary frontier is large and consistent. On LiveCodeBench (34.4% vs 83.3%), the difference is too big to explain away with evaluation methodology.

The MMMLU score of 85.5% is truly strong and reflects Mistral's investment in multilingual training across dozens of languages. For European enterprise deployments where multilingual capability matters, this is a meaningful advantage. The 256K context window also stands out - it is double what most open models offer.

Key Capabilities

Native Multimodal. Unlike models that bolt on a vision adapter post-training, Large 3's 2.5B parameter vision encoder is integrated into the architecture from the start. It processes up to 8 images per prompt with cross-modal analysis. Document understanding, chart interpretation, and visual reasoning all work within the same model. For teams that need a single self-hosted model handling both text and image workloads, this matters - you don't need a separate vision pipeline.

Enterprise-Grade Function Calling. Large 3 ships with native function calling and structured JSON output. The system prompt adherence is strong, which matters for production deployments where the model needs to stay within defined tool boundaries. Mistral recommends keeping tool counts minimal and well-defined, and in practice the model handles single-tool and multi-tool workflows reliably. For comparison, DeepSeek V3.2 posts stronger scores on MCP-Mark tool benchmarks, but Large 3's function calling is sufficient for standard enterprise integration patterns.

European Sovereignty. This is the strategic argument for Large 3. It's the only frontier-class model built by an European company with European data processing practices. For organizations subject to EU AI Act requirements, GDPR constraints, or data sovereignty mandates, Large 3 under Apache 2.0 offers a deployment path that doesn't depend on US or Chinese infrastructure. Mistral AI is headquartered in Paris and the model can be self-hosted on European cloud providers without API calls leaving EU jurisdiction.

Pricing and Availability

Tier	Input	Output
La Plateforme API	$0.50/M tokens	$1.50/M tokens

Mistral Large 3 is available through Mistral's La Plateforme API, Amazon Bedrock, Google Cloud Vertex AI, NVIDIA NIM, and Azure. The Apache 2.0 license means you can also download the weights from HuggingFace and self-host. Deployment requires a single multi-GPU node - 8xH200 in FP8 format, or a H100/A100 node in NVFP4 compressed format. The vLLM framework (>= 1.12.0) is recommended for serving.

At $0.50/$1.50 per million tokens, Large 3 is roughly 10x cheaper than Claude Opus 4.6 on input and 17x cheaper on output. It's also cheaper than Gemini 3.1 Pro at $2/$12. However, DeepSeek V3.2 at $0.28/$0.42 undercuts it notably while posting higher benchmark scores. The case for Large 3 isn't cost leadership - it is the combination of Apache 2.0 licensing, multimodal capability, and EU data sovereignty. See our open source vs proprietary AI guide for detailed cost modeling.

Strengths

Apache 2.0 license - fully open for commercial use, modification, and redistribution
Strongest open-weight multilingual model (MMMLU 85.5%, dozens of languages)
Native multimodal with 2.5B vision encoder - text and image in one architecture
256K context window - largest among frontier-class open models
Self-hostable on a single 8-GPU node (FP8 or NVFP4 compression)
European origin with EU data sovereignty advantages for regulated industries
Competitive API pricing at $0.50/$1.50 per million tokens

Weaknesses

GPQA Diamond (43.9%) is far below the frontier - weak on PhD-level reasoning
LiveCodeBench (34.4%) and SimpleQA (23.8%) show significant gaps to DeepSeek V3.2 and proprietary models
No video or audio input support - vision is limited to static images
675B total parameters require substantial infrastructure even for MoE inference
Reasoning depth lags behind DeepSeek V3.2 and the Claude/GPT/Gemini frontier
Relatively new release (December 2025) - community tooling and fine-tuning ecosystem still maturing

Open Source LLM Leaderboard - Current rankings for open-weight models including Large 3
Coding Benchmarks Leaderboard - HumanEval, LiveCodeBench, and SWE-bench rankings
Open Source vs Proprietary AI - Framework for deciding between open-weight and API models
Claude Opus 4.6 - The proprietary frontier leader for comparison
Gemini 3.1 Pro - Google's competing frontier model