FLUX.2 [dev]
Black Forest Labs' 32B open-weight image model - the most powerful open alternative for text-to-image, editing, and multi-reference generation with up to 10 reference images.
![FLUX.2 [dev]](https://awesomeagents.ai/images/models/flux-2-dev_hu_7649aaedcc0564.jpg)
FLUX.2 [dev] is the open-weight flagship of Black Forest Labs' second-generation image family. At 32 billion parameters, it couples a rectified flow transformer with Mistral-3's 24B vision-language model to deliver state-of-the-art text-to-image generation, single-reference editing, and multi-reference composition - all in one model. It ranks #9 on LM Arena's image generation leaderboard with a score of 1149, the highest of any open-weight model.
TL;DR
- 32B parameter rectified flow transformer + Mistral-3 24B VLM for world knowledge
- State-of-the-art open-weight image generation - LM Arena rank #9 (score 1149)
- Multi-reference support: combine up to 10 images for character, style, and object consistency
- ~32K token context from the VLM enables detailed, multi-part prompts
- Open weights on Hugging Face; non-commercial license for weights, API available for commercial use
- 80+ GB VRAM full precision, ~14-18 GB with FP8/4-bit quantization on RTX 4090
The model was released on November 25, 2025, predating the rest of the FLUX.2 lineup. It set the architectural template that the Pro, Max, and klein variants all build on: a flow transformer that handles spatial logic while the VLM handles language understanding, world knowledge, and contextual reasoning. The combination supports ~32K tokens of prompt context, enabling detailed scene descriptions and multi-step instructions.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Black Forest Labs |
| Model Family | FLUX.2 |
| Parameters | 32 billion (flow transformer) + 24B (Mistral-3 VLM) |
| Architecture | Rectified flow transformer + Mistral-3 24B VLM |
| VAE | Retrained from scratch for improved quality |
| Prompt Context | ~32K tokens |
| Max Resolution | Up to 4MP (2048x2048) |
| Multi-Reference | Up to 10 images |
| Inference Steps | 12-20 (preview), 28-50 (production) |
| VRAM (Full) | 80+ GB |
| VRAM (FP8/4-bit) | ~14-18 GB (RTX 4090) |
| Inference Speed | 2-4 seconds (optimized infra) |
| LM Arena Rank | #9 (score 1149) |
| Release Date | November 25, 2025 |
| License | FLUX Non-Commercial License (weights), API for commercial |
| Open Weights | Yes (Hugging Face) |
Benchmark Performance
| Metric | FLUX.2 Dev | FLUX.2 Max | Nano Banana 2 | Midjourney V7 |
|---|---|---|---|---|
| LM Arena Rank | #9 | #4 | N/A | N/A |
| LM Arena Score | 1149 | 1168 | N/A | N/A |
| Parameters | 32B + 24B VLM | 32B + 24B VLM | Not disclosed | Not disclosed |
| Multi-Reference | Up to 10 | Up to 10 | Up to 10 | Limited |
| Max Resolution | 4MP | 4MP | 4K | 2K |
| Text Rendering | ~60% | Best-in-class | ~90% | 71% |
| Open Weights | Yes | No | No | No |
| Fine-tuning | LoRA supported | No | No | No |
FLUX.2 [dev] consistently outperforms all open-weight alternatives by a significant margin across text-to-image, single-reference editing, and multi-reference editing. The gap to closed models (FLUX.2 Max, Midjourney V7, Nano Banana Pro) is narrower - roughly 19 ELO points behind Max, which translates to perceptible but not dramatic quality differences in most use cases.
Text rendering at ~60% accuracy remains a weak point across the FLUX.2 family. Nano Banana 2 leads here at ~90%, making it the better choice for infographics, UI mockups, or any output requiring legible text.
Key Capabilities
Multi-Reference Composition
The standout feature. FLUX.2 [dev] can accept 2-10 reference images and combine them into a novel output while maintaining character identity, style consistency, and object fidelity. No fine-tuning required - the model handles reference matching at inference time through the VLM's contextual understanding.
Use cases: brand-consistent marketing materials, character sheets for animation, product visualization with consistent styling, and storyboarding with recurring characters.
Vision-Language Model Integration
The Mistral-3 24B VLM is not just a text encoder - it brings world knowledge and contextual reasoning to the generation process. This enables physically plausible lighting, accurate spatial relationships, and contextually appropriate material properties. The ~32K token context window supports detailed, structured prompts that were impossible with previous-generation CLIP-based encoders.
LoRA Fine-Tuning
FLUX.2 [dev] supports LoRA adapters for custom fine-tuning, enabling domain-specific specialization without retraining the full model. The community has produced 11+ fine-tunes and 12+ adapters on Hugging Face. GGUF quantized versions from Unsloth and City96 make fine-tuning and inference more accessible on consumer hardware.
Image Editing
Beyond generation, the model handles image-to-image editing: style transfer, inpainting, outpainting, object replacement, and lighting modification. Single-reference and multi-reference editing workflows use the same model weights.
Pricing and Availability
| Access Point | Pricing | Status |
|---|---|---|
| Open Weights (Hugging Face) | Free (non-commercial) | Available |
| BFL API | Per-megapixel pricing | Available |
| WaveSpeed AI | $0.012/image | Available |
| Replicate | Per-second pricing | Available |
| ComfyUI | Free (local) | Supported |
| Diffusers | Free (local) | Supported |
For commercial deployment, use the BFL API or third-party providers. The open weights are licensed for non-commercial use only - research, personal projects, and evaluation.
Strengths
- Highest-quality open-weight image generation model available (LM Arena #9)
- Multi-reference composition with up to 10 images - no fine-tuning needed
- 32K token context enables detailed, structured prompts
- LoRA fine-tuning for domain specialization
- Active quantization ecosystem (GGUF, FP8) brings VRAM to 14-18 GB
- Retrained VAE delivers improved quality at the compression boundary
- Full editing capabilities in the same model weights
Weaknesses
- 80+ GB VRAM at full precision - requires A100/H100 or heavy quantization for local use
- Non-commercial license restricts production use to API
- Text rendering (~60%) significantly trails Nano Banana 2 (~90%)
- 2-4 second generation - much slower than klein 4B's sub-second speed
- Complex multi-reference prompts can produce inconsistent results at the edges
- No web-grounded generation (unlike Nano Banana 2's Gemini integration)
Related Coverage
- FLUX.2 [klein] 4B - The fast, fully open Apache 2.0 variant
- FLUX.2 [klein] 9B - Mid-tier distilled variant
- Best AI Image Generators 2026 - Full market comparison
- Nano Banana 2 - Google's competing image generation model
Sources:
✓ Last verified March 14, 2026
![FLUX.2 [dev]](https://awesomeagents.ai/images/authors/james-kowalski_hu_7ab946b802bc1a95.jpg)