Models

Mistral Large 3

Mistral Large 3 is a 675B-parameter MoE model activating 41B per token with native multimodal support, a 256K context window, and Apache 2.0 licensing - Europe's first frontier-class open-weight model.

Mistral Large 3

TL;DR

  • 675B total / 41B active granular MoE with 2.5B vision encoder - Apache 2.0 license
  • #2 open-source non-reasoning model and #6 overall on LMArena (1418 Elo)
  • Native multimodal (text + images), 256K context window, strong function calling
  • API at $0.50/$1.50 per million tokens - approximately 80% cheaper than GPT-4o class models
  • Trained on 3,000 H200 GPUs - Europe's answer to the US/China frontier model race

Overview

Mistral AI released Large 3 on December 2, 2025, and it represents a genuine inflection point for European AI. This is a 675 billion parameter Mixture-of-Experts model that activates 41 billion parameters per forward pass, ships with native vision (a 2.5B parameter encoder fused into the architecture), and is released under Apache 2.0. That license choice matters - Mistral's previous flagship (Large 2) used the more restrictive Mistral Research License. With Apache 2.0, Large 3 can be commercially deployed, modified, and redistributed without restrictions.

On the LMArena leaderboard, Large 3 debuted at 1418 Elo - #2 among open-source non-reasoning models, #6 overall in the open-weight category. It outperforms Llama 3.1 405B and the previous generation of open models by a comfortable margin. On MMMLU (multilingual knowledge), it scores 85.5%. On HumanEval (Python code generation), it posts approximately 92% pass@1. These are strong numbers for an open model, and the 256K context window is the largest among any open-weight model at this capability tier.

But let us be precise about where Large 3 sits. On GPQA Diamond (43.9%) and SimpleQA (23.8%), it scores substantially below DeepSeek V3.2 (82.4% and 97.1% respectively) and the proprietary frontier. The gap is not small - it is 38 points on GPQA Diamond versus DeepSeek V3.2 alone. Claude Opus 4.6 at 91.3% is in a different league on PhD-level reasoning. Large 3 is a strong generalist, not a reasoning specialist. Its value proposition is breadth, cost efficiency, multimodal capability, and the Apache 2.0 license - not beating proprietary models on the hardest benchmarks.

Key Specifications

SpecificationDetails
ProviderMistral AI
Model FamilyMistral Large
ArchitectureGranular Mixture-of-Experts + Vision Encoder
Total Parameters675B (673B language model + 2.5B vision encoder)
Active Parameters41B per token (39B language + vision encoder)
Context Window256,000 tokens
Input Price$0.50/M tokens
Output Price$1.50/M tokens
Release DateDecember 2, 2025
LicenseApache 2.0
Input ModalitiesText, Images (up to 8 per prompt)
Output ModalityText
QuantizationFP8 (H200/B200), NVFP4 (H100/A100)
Recommended Infra8xH200 node (FP8) or H100/A100 node (NVFP4)
Model IDmistral-large-latest

Benchmark Performance

BenchmarkMistral Large 3DeepSeek V3.2Claude Opus 4.6Llama 3.1 405B
MMMLU (multilingual knowledge)85.5%84.2%91.1%73.1%
GPQA Diamond (PhD-level science)43.9%82.4%91.3%50.7%
HumanEval (Python, pass@1)92.0%90.5%93.2%89.0%
LiveCodeBench34.4%83.3%78.5%28.9%
SimpleQA (factual accuracy)23.8%97.1%82.0%19.6%
LMArena Elo (human preference)1418145614961352
Context Window256K128K1M (beta)128K

Two things jump out from this table. First, Mistral Large 3 convincingly beats Llama 3.1 405B across every metric - it is the clear open-weight generalist leader in that comparison. Second, the gap to DeepSeek V3.2 and the proprietary frontier is large and consistent. On LiveCodeBench (34.4% vs 83.3%), the difference is too big to explain away with evaluation methodology.

The MMMLU score of 85.5% is genuinely strong and reflects Mistral's investment in multilingual training across dozens of languages. For European enterprise deployments where multilingual capability matters, this is a meaningful advantage. The 256K context window also stands out - it is double what most open models offer.

Key Capabilities

Native Multimodal. Unlike models that bolt on a vision adapter post-training, Large 3's 2.5B parameter vision encoder is integrated into the architecture from the start. It processes up to 8 images per prompt with cross-modal analysis. Document understanding, chart interpretation, and visual reasoning all work within the same model. For teams that need a single self-hosted model handling both text and image workloads, this matters - you do not need a separate vision pipeline.

Enterprise-Grade Function Calling. Large 3 ships with native function calling and structured JSON output. The system prompt adherence is strong, which matters for production deployments where the model needs to stay within defined tool boundaries. Mistral recommends keeping tool counts minimal and well-defined, and in practice the model handles single-tool and multi-tool workflows reliably. For comparison, DeepSeek V3.2 posts stronger scores on MCP-Mark tool benchmarks, but Large 3's function calling is sufficient for standard enterprise integration patterns.

European Sovereignty. This is the strategic argument for Large 3. It is the only frontier-class model built by a European company with European data processing practices. For organizations subject to EU AI Act requirements, GDPR constraints, or data sovereignty mandates, Large 3 under Apache 2.0 offers a deployment path that does not depend on US or Chinese infrastructure. Mistral AI is headquartered in Paris and the model can be self-hosted on European cloud providers without API calls leaving EU jurisdiction.

Pricing and Availability

TierInputOutput
La Plateforme API$0.50/M tokens$1.50/M tokens

Mistral Large 3 is available through Mistral's La Plateforme API, Amazon Bedrock, Google Cloud Vertex AI, NVIDIA NIM, and Azure. The Apache 2.0 license means you can also download the weights from HuggingFace and self-host. Deployment requires a single multi-GPU node - 8xH200 in FP8 format, or an H100/A100 node in NVFP4 compressed format. The vLLM framework (>= 1.12.0) is recommended for serving.

At $0.50/$1.50 per million tokens, Large 3 is roughly 10x cheaper than Claude Opus 4.6 on input and 17x cheaper on output. It is also cheaper than Gemini 3.1 Pro at $2/$12. However, DeepSeek V3.2 at $0.28/$0.42 undercuts it significantly while posting higher benchmark scores. The case for Large 3 is not cost leadership - it is the combination of Apache 2.0 licensing, multimodal capability, and EU data sovereignty. See our open source vs proprietary AI guide for detailed cost modeling.

Strengths

  • Apache 2.0 license - fully open for commercial use, modification, and redistribution
  • Strongest open-weight multilingual model (MMMLU 85.5%, dozens of languages)
  • Native multimodal with 2.5B vision encoder - text and image in one architecture
  • 256K context window - largest among frontier-class open models
  • Self-hostable on a single 8-GPU node (FP8 or NVFP4 compression)
  • European origin with EU data sovereignty advantages for regulated industries
  • Competitive API pricing at $0.50/$1.50 per million tokens

Weaknesses

  • GPQA Diamond (43.9%) is far below the frontier - weak on PhD-level reasoning
  • LiveCodeBench (34.4%) and SimpleQA (23.8%) show significant gaps to DeepSeek V3.2 and proprietary models
  • No video or audio input support - vision is limited to static images
  • 675B total parameters require substantial infrastructure even for MoE inference
  • Reasoning depth lags behind DeepSeek V3.2 and the Claude/GPT/Gemini frontier
  • Relatively new release (December 2025) - community tooling and fine-tuning ecosystem still maturing

Sources

Mistral Large 3
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.