FLUX.2 [klein] 4B
Black Forest Labs' fastest open-source image generation model - 4B parameters, Apache 2.0 license, sub-second generation on consumer GPUs with 13GB VRAM.
![FLUX.2 [klein] 4B](https://awesomeagents.ai/images/models/flux-2-klein-4b_hu_fb2b6d759eb8633a.jpg)
FLUX.2 [klein] 4B is the smallest and fastest model in Black Forest Labs' second-generation image family. At 4 billion parameters with 4-step distilled inference, it generates images in under one second on an NVIDIA GB200 and roughly 1.2 seconds on a consumer RTX 5090. It fits in 8.4 GB of VRAM (distilled) or 9.2 GB (base), making it the first model in the FLUX.2 lineup that runs comfortably on mid-range consumer GPUs. And it ships under Apache 2.0 - fully open for commercial use.
TL;DR
- 4B parameter rectified flow transformer, distilled to 4 inference steps for sub-second generation
- Apache 2.0 license - fully open weights, commercial use allowed, fine-tuning supported
- 8.4 GB VRAM (distilled), runs on RTX 3090/4070 and above (~13 GB with CPU offloading)
- Unified architecture: text-to-image, image-to-image editing, and multi-reference in one model
- 30%+ faster than any competing model at comparable quality per Black Forest Labs' claims
The model matters for two reasons. First, it brings the FLUX.2 architecture - which couples a rectified flow transformer with Mistral-3's 24B vision-language model for world knowledge - down to a size that individual developers and small teams can actually run locally. Second, the Apache 2.0 license removes every commercial restriction that held back FLUX.1's non-commercial variants. You can fine-tune it, deploy it in production, and sell the outputs without licensing friction.
Key Specifications
| Specification | Details |
|---|---|
| Provider | Black Forest Labs |
| Model Family | FLUX.2 |
| Parameters | 4 billion |
| Architecture | Rectified flow transformer + Mistral-3 24B VLM |
| Distillation | 4-step (distilled variant) |
| VRAM (Distilled) | 8.4 GB |
| VRAM (Base) | 9.2 GB |
| VRAM (with offloading) | ~13 GB on consumer GPUs |
| Inference Speed (GB200) | ~0.3 seconds |
| Inference Speed (RTX 5090) | ~1.2 seconds |
| Base Inference (GB200) | ~3 seconds |
| Base Inference (RTX 5090) | ~17 seconds |
| Default Resolution | 1024x1024 |
| Guidance Scale | 1.0 (default) |
| Inference Steps | 4 (distilled) |
| Release Date | January 15, 2026 |
| License | Apache 2.0 |
| Open Weights | Yes (Hugging Face) |
Benchmark Performance
| Metric | FLUX.2 klein 4B | FLUX.2 klein 9B | FLUX.2 Dev (32B) | Nano Banana 2 |
|---|---|---|---|---|
| Parameters | 4B | 9B | 32B | Not disclosed |
| Inference (GB200) | ~0.3s | ~0.5s | 2-4s | 4-6s |
| VRAM | 8.4 GB | 19.6 GB | 80+ GB (full) | Cloud only |
| License | Apache 2.0 | Non-commercial | Non-commercial | Proprietary |
| Fine-tuning | Yes | Limited | Yes (LoRA) | No |
| Multi-reference | Yes | Yes | Yes (up to 10) | Yes (up to 10) |
| Text Rendering | Basic | Improved | ~60% | ~90% |
FLUX.2 [klein] 4B trades quality for speed. Text rendering is noticeably weaker than the larger FLUX.2 variants and far behind Nano Banana 2's ~90% accuracy. Photorealism and fine detail also drop compared to the 9B and 32B siblings. But for latency-critical workflows - real-time previews, interactive editing, batch processing - the sub-second generation time is unmatched.
Key Capabilities
Unified Architecture
Unlike previous FLUX releases that required separate models for generation and editing, FLUX.2 [klein] 4B handles text-to-image generation, image-to-image editing, and multi-reference composition in a single model. You load one set of weights and switch between modes through the prompt.
Consumer GPU Deployment
The 8.4 GB VRAM footprint (distilled) fits within the budget of an RTX 4070, RTX 3090, or any GPU with 13+ GB when using CPU offloading via enable_model_cpu_offload(). This makes it the most accessible high-quality open image model available - no cloud GPU rental required.
Distilled and Base Variants
The distilled variant (4 steps, ~0.3s on GB200) prioritizes speed. The base variant (full steps, ~3s on GB200) offers higher quality at the cost of slower inference. Both share the same weights architecture, and the base model serves as a starting point for custom fine-tuning where quality matters more than latency.
Safety Features
The model includes pre-training NSFW/CSAM filtering (partnered with IWF), post-training safety fine-tuning, pixel-layer watermarking, and C2PA cryptographic metadata signing on API outputs. Third-party adversarial testing was conducted before release.
Pricing and Availability
| Access Point | Pricing | Status |
|---|---|---|
| Open Weights (Hugging Face) | Free | Available |
| GitHub Source | Free | Available |
| BFL API | Per-megapixel pricing | Available |
| ComfyUI | Free (local) | Supported |
| Diffusers (HuggingFace) | Free (local) | Supported |
| 40+ HF Spaces | Free (limited) | Available |
The model is fully free to download and run locally. API access through Black Forest Labs charges per megapixel of output. Third-party providers like WaveSpeed AI offer the model at $0.012 per image.
Quick Start
import torch
from diffusers import Flux2KleinPipeline
pipe = Flux2KleinPipeline.from_pretrained(
"black-forest-labs/FLUX.2-klein-4B",
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A cat holding a sign that says hello world",
height=1024, width=1024,
guidance_scale=1.0,
num_inference_steps=4,
generator=torch.Generator("cuda").manual_seed(0)
).images[0]
Strengths
- Sub-second generation on datacenter GPUs, ~1.2s on consumer hardware
- Apache 2.0 license - no commercial restrictions, full fine-tuning rights
- 8.4 GB VRAM makes it accessible to mid-range consumer GPUs
- Unified model for generation, editing, and multi-reference
- Active community: 534 likes, 12 adapters, 11 fine-tunes, 13 quantizations on Hugging Face
- Pixel-layer watermarking and C2PA support for content provenance
Weaknesses
- Text rendering quality significantly below larger FLUX.2 variants and competitors
- Photorealism and fine detail are visibly weaker than 9B and 32B siblings
- Distilled variant (4 steps) shows quality tradeoffs vs. full-step base
- Limited maximum resolution compared to FLUX.2 Max's 4MP output
- May amplify biases from training data
- No web-grounded generation (unlike Nano Banana 2's Gemini integration)
Related Coverage
- Best AI Image Generators 2026 - Full market comparison including FLUX.2
- Nano Banana 2 - Google's competing image generation model
Sources:
- FLUX.2 klein 4B Model Card - Hugging Face
- FLUX.2 klein - Black Forest Labs
- FLUX.2 klein: Towards Interactive Visual Intelligence - BFL Blog
- Black Forest Labs launches open source FLUX.2 klein - VentureBeat
- FLUX.2 Image Models Optimized for NVIDIA RTX GPUs - NVIDIA Blog
- FLUX.2 klein 4B: AI Image Generation in a Second - Medium
✓ Last verified March 14, 2026
![FLUX.2 [klein] 4B](https://awesomeagents.ai/images/authors/james-kowalski_hu_7ab946b802bc1a95.jpg)