Name: FLUX.2 [dev]
Author: Black Forest Labs

FLUX.2 [dev] is the open-weight flagship of Black Forest Labs' second-generation image family. At 32 billion parameters, it couples a rectified flow transformer with Mistral-3's 24B vision-language model to deliver state-of-the-art text-to-image generation, single-reference editing, and multi-reference composition - all in one model. It ranks #9 on LM Arena's image generation leaderboard with a score of 1149, the highest of any open-weight model.

TL;DR

32B parameter rectified flow transformer + Mistral-3 24B VLM for world knowledge
State-of-the-art open-weight image generation - LM Arena rank #9 (score 1149)
Multi-reference support: combine up to 10 images for character, style, and object consistency
~32K token context from the VLM enables detailed, multi-part prompts
Open weights on Hugging Face; non-commercial license for weights, API available for commercial use
80+ GB VRAM full precision, ~14-18 GB with FP8/4-bit quantization on RTX 4090

The model was released on November 25, 2025, predating the rest of the FLUX.2 lineup. It set the architectural template that the Pro, Max, and klein variants all build on: a flow transformer that handles spatial logic while the VLM handles language understanding, world knowledge, and contextual reasoning. The combination supports ~32K tokens of prompt context, enabling detailed scene descriptions and multi-step instructions.

Key Specifications

Specification	Details
Provider	Black Forest Labs
Model Family	FLUX.2
Parameters	32 billion (flow transformer) + 24B (Mistral-3 VLM)
Architecture	Rectified flow transformer + Mistral-3 24B VLM
VAE	Retrained from scratch for improved quality
Prompt Context	~32K tokens
Max Resolution	Up to 4MP (2048x2048)
Multi-Reference	Up to 10 images
Inference Steps	12-20 (preview), 28-50 (production)
VRAM (Full)	80+ GB
VRAM (FP8/4-bit)	~14-18 GB (RTX 4090)
Inference Speed	2-4 seconds (optimized infra)
LM Arena Rank	#9 (score 1149)
Release Date	November 25, 2025
License	FLUX Non-Commercial License (weights), API for commercial
Open Weights	Yes (Hugging Face)

Benchmark Performance

Metric	FLUX.2 Dev	FLUX.2 Max	Nano Banana 2	Midjourney V7
LM Arena Rank	#9	#4	N/A	N/A
LM Arena Score	1149	1168	N/A	N/A
Parameters	32B + 24B VLM	32B + 24B VLM	Not disclosed	Not disclosed
Multi-Reference	Up to 10	Up to 10	Up to 10	Limited
Max Resolution	4MP	4MP	4K	2K
Text Rendering	~60%	Best-in-class	~90%	71%
Open Weights	Yes	No	No	No
Fine-tuning	LoRA supported	No	No	No

FLUX.2 [dev] consistently outperforms all open-weight alternatives by a significant margin across text-to-image, single-reference editing, and multi-reference editing. The gap to closed models (FLUX.2 Max, Midjourney V7, Nano Banana Pro) is narrower - roughly 19 ELO points behind Max, which translates to perceptible but not dramatic quality differences in most use cases.

Text rendering at ~60% accuracy remains a weak point across the FLUX.2 family. Nano Banana 2 leads here at ~90%, making it the better choice for infographics, UI mockups, or any output requiring legible text.

Key Capabilities

Multi-Reference Composition

The standout feature. FLUX.2 [dev] can accept 2-10 reference images and combine them into a novel output while maintaining character identity, style consistency, and object fidelity. No fine-tuning required - the model handles reference matching at inference time through the VLM's contextual understanding.

Use cases: brand-consistent marketing materials, character sheets for animation, product visualization with consistent styling, and storyboarding with recurring characters.

Vision-Language Model Integration

The Mistral-3 24B VLM is not just a text encoder - it brings world knowledge and contextual reasoning to the generation process. This enables physically plausible lighting, accurate spatial relationships, and contextually appropriate material properties. The ~32K token context window supports detailed, structured prompts that were impossible with previous-generation CLIP-based encoders.

LoRA Fine-Tuning

FLUX.2 [dev] supports LoRA adapters for custom fine-tuning, enabling domain-specific specialization without retraining the full model. The community has produced 11+ fine-tunes and 12+ adapters on Hugging Face. GGUF quantized versions from Unsloth and City96 make fine-tuning and inference more accessible on consumer hardware.

Image Editing

Beyond generation, the model handles image-to-image editing: style transfer, inpainting, outpainting, object replacement, and lighting modification. Single-reference and multi-reference editing workflows use the same model weights.

Pricing and Availability

Access Point	Pricing	Status
Open Weights (Hugging Face)	Free (non-commercial)	Available
BFL API	Per-megapixel pricing	Available
WaveSpeed AI	$0.012/image	Available
Replicate	Per-second pricing	Available
ComfyUI	Free (local)	Supported
Diffusers	Free (local)	Supported

For commercial deployment, use the BFL API or third-party providers. The open weights are licensed for non-commercial use only - research, personal projects, and evaluation.

Strengths

Highest-quality open-weight image generation model available (LM Arena #9)
Multi-reference composition with up to 10 images - no fine-tuning needed
32K token context enables detailed, structured prompts
LoRA fine-tuning for domain specialization
Active quantization ecosystem (GGUF, FP8) brings VRAM to 14-18 GB
Retrained VAE delivers improved quality at the compression boundary
Full editing capabilities in the same model weights

Weaknesses

80+ GB VRAM at full precision - requires A100/H100 or heavy quantization for local use
Non-commercial license restricts production use to API
Text rendering (~60%) significantly trails Nano Banana 2 (~90%)
2-4 second generation - much slower than klein 4B's sub-second speed
Complex multi-reference prompts can produce inconsistent results at the edges
No web-grounded generation (unlike Nano Banana 2's Gemini integration)

FLUX.2 [klein] 4B - The fast, fully open Apache 2.0 variant
FLUX.2 [klein] 9B - Mid-tier distilled variant
Best AI Image Generators 2026 - Full market comparison
Nano Banana 2 - Google's competing image generation model

Sources: