Mistral Large 3
Mistral Large 3 is a 675B-parameter MoE model activating 41B per token with native multimodal support, a 256K context window, and Apache 2.0 licensing - Europe's first frontier-class open-weight model.

TL;DR
- 675B total / 41B active granular MoE with 2.5B vision encoder - Apache 2.0 license
- #2 open-source non-reasoning model and #6 overall on LMArena (1418 Elo)
- Native multimodal (text + images), 256K context window, strong function calling
- API at $0.50/$1.50 per million tokens - approximately 80% cheaper than GPT-4o class models
- Trained on 3,000 H200 GPUs - Europe's answer to the US/China frontier model race
Overview
Mistral AI released Large 3 on December 2, 2025, and it represents a genuine inflection point for European AI. This is a 675 billion parameter Mixture-of-Experts model that activates 41 billion parameters per forward pass, ships with native vision (a 2.5B parameter encoder fused into the architecture), and is released under Apache 2.0. That license choice matters - Mistral's previous flagship (Large 2) used the more restrictive Mistral Research License. With Apache 2.0, Large 3 can be commercially deployed, modified, and redistributed without restrictions.
On the LMArena leaderboard, Large 3 debuted at 1418 Elo - #2 among open-source non-reasoning models, #6 overall in the open-weight category. It outperforms Llama 3.1 405B and the previous generation of open models by a comfortable margin. On MMMLU (multilingual knowledge), it scores 85.5%. On HumanEval (Python code generation), it posts approximately 92% pass@1. These are strong numbers for an open model, and the 256K context window is the largest among any open-weight model at this capability tier.
But let us be precise about where Large 3 sits. On GPQA Diamond (43.9%) and SimpleQA (23.8%), it scores substantially below DeepSeek V3.2 (82.4% and 97.1% respectively) and the proprietary frontier. The gap is not small - it is 38 points on GPQA Diamond versus DeepSeek V3.2 alone. Claude Opus 4.6 at 91.3% is in a different league on PhD-level reasoning. Large 3 is a strong generalist, not a reasoning specialist. Its value proposition is breadth, cost efficiency, multimodal capability, and the Apache 2.0 license - not beating proprietary models on the hardest benchmarks.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Mistral AI |
| Model Family | Mistral Large |
| Architecture | Granular Mixture-of-Experts + Vision Encoder |
| Total Parameters | 675B (673B language model + 2.5B vision encoder) |
| Active Parameters | 41B per token (39B language + vision encoder) |
| Context Window | 256,000 tokens |
| Input Price | $0.50/M tokens |
| Output Price | $1.50/M tokens |
| Release Date | December 2, 2025 |
| License | Apache 2.0 |
| Input Modalities | Text, Images (up to 8 per prompt) |
| Output Modality | Text |
| Quantization | FP8 (H200/B200), NVFP4 (H100/A100) |
| Recommended Infra | 8xH200 node (FP8) or H100/A100 node (NVFP4) |
| Model ID | mistral-large-latest |
Benchmark Performance
| Benchmark | Mistral Large 3 | DeepSeek V3.2 | Claude Opus 4.6 | Llama 3.1 405B |
|---|---|---|---|---|
| MMMLU (multilingual knowledge) | 85.5% | 84.2% | 91.1% | 73.1% |
| GPQA Diamond (PhD-level science) | 43.9% | 82.4% | 91.3% | 50.7% |
| HumanEval (Python, pass@1) | 92.0% | 90.5% | 93.2% | 89.0% |
| LiveCodeBench | 34.4% | 83.3% | 78.5% | 28.9% |
| SimpleQA (factual accuracy) | 23.8% | 97.1% | 82.0% | 19.6% |
| LMArena Elo (human preference) | 1418 | 1456 | 1496 | 1352 |
| Context Window | 256K | 128K | 1M (beta) | 128K |
Two things jump out from this table. First, Mistral Large 3 convincingly beats Llama 3.1 405B across every metric - it is the clear open-weight generalist leader in that comparison. Second, the gap to DeepSeek V3.2 and the proprietary frontier is large and consistent. On LiveCodeBench (34.4% vs 83.3%), the difference is too big to explain away with evaluation methodology.
The MMMLU score of 85.5% is genuinely strong and reflects Mistral's investment in multilingual training across dozens of languages. For European enterprise deployments where multilingual capability matters, this is a meaningful advantage. The 256K context window also stands out - it is double what most open models offer.
Key Capabilities
Native Multimodal. Unlike models that bolt on a vision adapter post-training, Large 3's 2.5B parameter vision encoder is integrated into the architecture from the start. It processes up to 8 images per prompt with cross-modal analysis. Document understanding, chart interpretation, and visual reasoning all work within the same model. For teams that need a single self-hosted model handling both text and image workloads, this matters - you do not need a separate vision pipeline.
Enterprise-Grade Function Calling. Large 3 ships with native function calling and structured JSON output. The system prompt adherence is strong, which matters for production deployments where the model needs to stay within defined tool boundaries. Mistral recommends keeping tool counts minimal and well-defined, and in practice the model handles single-tool and multi-tool workflows reliably. For comparison, DeepSeek V3.2 posts stronger scores on MCP-Mark tool benchmarks, but Large 3's function calling is sufficient for standard enterprise integration patterns.
European Sovereignty. This is the strategic argument for Large 3. It is the only frontier-class model built by a European company with European data processing practices. For organizations subject to EU AI Act requirements, GDPR constraints, or data sovereignty mandates, Large 3 under Apache 2.0 offers a deployment path that does not depend on US or Chinese infrastructure. Mistral AI is headquartered in Paris and the model can be self-hosted on European cloud providers without API calls leaving EU jurisdiction.
Pricing and Availability
| Tier | Input | Output |
|---|---|---|
| La Plateforme API | $0.50/M tokens | $1.50/M tokens |
Mistral Large 3 is available through Mistral's La Plateforme API, Amazon Bedrock, Google Cloud Vertex AI, NVIDIA NIM, and Azure. The Apache 2.0 license means you can also download the weights from HuggingFace and self-host. Deployment requires a single multi-GPU node - 8xH200 in FP8 format, or an H100/A100 node in NVFP4 compressed format. The vLLM framework (>= 1.12.0) is recommended for serving.
At $0.50/$1.50 per million tokens, Large 3 is roughly 10x cheaper than Claude Opus 4.6 on input and 17x cheaper on output. It is also cheaper than Gemini 3.1 Pro at $2/$12. However, DeepSeek V3.2 at $0.28/$0.42 undercuts it significantly while posting higher benchmark scores. The case for Large 3 is not cost leadership - it is the combination of Apache 2.0 licensing, multimodal capability, and EU data sovereignty. See our open source vs proprietary AI guide for detailed cost modeling.
Strengths
- Apache 2.0 license - fully open for commercial use, modification, and redistribution
- Strongest open-weight multilingual model (MMMLU 85.5%, dozens of languages)
- Native multimodal with 2.5B vision encoder - text and image in one architecture
- 256K context window - largest among frontier-class open models
- Self-hostable on a single 8-GPU node (FP8 or NVFP4 compression)
- European origin with EU data sovereignty advantages for regulated industries
- Competitive API pricing at $0.50/$1.50 per million tokens
Weaknesses
- GPQA Diamond (43.9%) is far below the frontier - weak on PhD-level reasoning
- LiveCodeBench (34.4%) and SimpleQA (23.8%) show significant gaps to DeepSeek V3.2 and proprietary models
- No video or audio input support - vision is limited to static images
- 675B total parameters require substantial infrastructure even for MoE inference
- Reasoning depth lags behind DeepSeek V3.2 and the Claude/GPT/Gemini frontier
- Relatively new release (December 2025) - community tooling and fine-tuning ecosystem still maturing
Related Coverage
- Open Source LLM Leaderboard - Current rankings for open-weight models including Large 3
- Coding Benchmarks Leaderboard - HumanEval, LiveCodeBench, and SWE-bench rankings
- Open Source vs Proprietary AI - Framework for deciding between open-weight and API models
- Claude Opus 4.6 - The proprietary frontier leader for comparison
- Gemini 3.1 Pro - Google's competing frontier model
Sources
- Introducing Mistral 3 - Mistral AI Blog
- Mistral-Large-3-675B-Instruct-2512 Model Card (HuggingFace)
- Mistral Large 3: An Open-Source MoE LLM Explained - IntuitionLabs
- Mistral Large 3 Intelligence & Performance Analysis - Artificial Analysis
- Mistral Large 3 (2512) Review - Barnacle Goose
- Mistral AI Pricing - La Plateforme
- NVIDIA-Accelerated Mistral 3 - NVIDIA Blog
