Reviews

Llama 4 Maverick Review: Meta's Open-Weight Multimodal Contender

A comprehensive review of Meta's Llama 4 Maverick, a 400B parameter open-weight MoE model with 128 experts, 1M context, and multimodal capabilities.

Llama 4 Maverick Review: Meta's Open-Weight Multimodal Contender

Meta's Llama series has been the backbone of the open-weight AI ecosystem, and Llama 4 Maverick represents the most ambitious release yet. With 400 billion total parameters, 128 mixture-of-experts, and only 17 billion active parameters per forward pass, it is an engineering marvel that delivers flagship-level performance while remaining practical to deploy. After extensive testing, we believe it has earned its place as the best open-weight general assistant available.

Architecture: iRoPE and the 128-Expert Design

Maverick introduces the iRoPE (improved Rotary Position Embedding) architecture, an evolution of the positional encoding scheme that has become standard in modern transformers. The practical benefit is a 1 million token context window that works reliably from day one, without the progressive context extension tricks that earlier models required.

The 128-expert MoE design is aggressive by current standards. Most MoE models use 8-16 experts; Maverick's 128-expert layout creates much finer-grained specialization. The routing network learns to assign tokens to highly specific expert combinations, effectively giving the model a large library of specialized sub-networks it can compose for any given task. With only 17B parameters active per token, inference is remarkably efficient for a model of this total size.

The model was co-distilled from Behemoth, Meta's much larger internal model. This training approach transfers knowledge from a massive teacher model into the more efficient student architecture, allowing Maverick to punch well above its active parameter count.

Performance: Beating GPT-4o Across the Board

Meta's claim that Maverick beats GPT-4o across the board held up in our testing. On standard language benchmarks, coding tasks, and reasoning problems, Maverick consistently outperformed GPT-4o, often by meaningful margins. To be clear, GPT-4o is now a generation behind GPT-5.2, so this is not a claim of beating the very best proprietary models. But surpassing what was the leading model just a generation ago is a significant achievement for an open-weight release.

On MMLU-Pro, Maverick scores competitively with mid-tier frontier models. Its coding performance on HumanEval and MBPP is strong, generating correct solutions to complex algorithmic problems and demonstrating understanding of software engineering patterns beyond simple function completion. On mathematical reasoning, it handles undergraduate-level problems confidently and manages many graduate-level challenges, though it falls short of DeepSeek V3.2's IMO-level prowess.

Chat and Creative Writing

This is where Maverick genuinely surprises. Open-weight models have historically been weaker at the subjective, hard-to-benchmark qualities that make a model pleasant to converse with. Maverick bucks this trend. Its responses are natural, well-paced, and show personality without being overbearing. Meta clearly invested heavily in the RLHF and preference tuning stages, and it shows.

Creative writing is a particular strength. Maverick generates fiction with genuine narrative voice, maintains consistent characterization across long passages, and handles different genres with appropriate tone shifts. We tested it with poetry, short stories, screenplay dialogue, and marketing copy, and the output quality consistently exceeded our expectations for an open-weight model. The co-distillation from Behemoth likely plays a role here, transferring stylistic capabilities that are difficult to develop in smaller models trained from scratch.

Instruction following is reliable. Complex multi-constraint prompts are handled well, with the model tracking format requirements, tone specifications, length targets, and content constraints simultaneously. It occasionally drops one constraint when five or more are specified, but this is a limitation shared by most models.

Multimodal Capabilities

Maverick includes native image understanding, making it one of the few open-weight models with strong multimodal capability. It can describe images accurately, answer questions about visual content, and reason about spatial relationships in photographs and diagrams.

That said, its vision capabilities do not match Gemini 3 Pro's natively multimodal architecture. Maverick handles straightforward visual question-answering well but struggles with tasks requiring fine-grained spatial reasoning or understanding of complex visual scenes with many interacting elements. It is better thought of as a strong language model with competent vision than as a true multimodal reasoner.

Deployment and Ecosystem

The open-weight nature of Maverick means it can be deployed on private infrastructure, fine-tuned for specific domains, and integrated into products without API dependencies. Meta's licensing is permissive, requiring only attribution for commercial use. The model is available through Hugging Face, and the community has already produced quantized variants that run on more modest hardware.

With 17B active parameters, Maverick is deployable on a single high-end GPU node for inference, making it accessible to a much wider range of organizations than its 400B total parameter count might suggest. Fine-tuning requires more resources but remains within reach for well-funded teams.

Strengths and Weaknesses

Strengths:

  • Best open-weight general assistant with strong all-around performance
  • Excellent creative writing and conversational quality
  • 128-expert MoE with only 17B active parameters enables efficient deployment
  • 1M context window with reliable long-range retrieval
  • Co-distillation from Behemoth transfers frontier-level knowledge
  • Permissive open-weight license for commercial use

Weaknesses:

  • Vision capabilities are good but not best-in-class
  • Mathematical reasoning trails specialized models like DeepSeek V3.2
  • Still meaningfully behind GPT-5.2 and Claude Opus 4.6 on the hardest tasks
  • Fine-tuning requires substantial GPU resources despite efficient inference
  • Safety alignment is less robust than proprietary model offerings
  • Community tooling, while growing, is less mature than the Llama 3 ecosystem

Verdict: 8.8/10

Llama 4 Maverick is the best open-weight general assistant available today. It excels at the things that matter most for everyday use: natural conversation, creative writing, coding, and reliable instruction following. The 128-expert MoE architecture delivers impressive capability at manageable inference costs, and the 1M context window is genuinely useful. It does not quite reach the heights of the top proprietary models on the hardest benchmarks, but for the vast majority of tasks, the difference is negligible. If you want a powerful, versatile model that you can host, customize, and control, Maverick is the clear choice.

About the author Senior AI Editor & Investigative Journalist

Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem.