Name: FLUX.2 [klein] 4B
Author: Black Forest Labs

FLUX.2 [klein] 4B is the smallest and fastest model in Black Forest Labs' second-generation image family. At 4 billion parameters with 4-step distilled inference, it generates images in under one second on an NVIDIA GB200 and roughly 1.2 seconds on a consumer RTX 5090. It fits in 8.4 GB of VRAM (distilled) or 9.2 GB (base), making it the first model in the FLUX.2 lineup that runs comfortably on mid-range consumer GPUs. And it ships under Apache 2.0 - fully open for commercial use.

TL;DR

4B parameter rectified flow transformer, distilled to 4 inference steps for sub-second generation
Apache 2.0 license - fully open weights, commercial use allowed, fine-tuning supported
8.4 GB VRAM (distilled), runs on RTX 3090/4070 and above (~13 GB with CPU offloading)
Unified architecture: text-to-image, image-to-image editing, and multi-reference in one model
30%+ faster than any competing model at comparable quality per Black Forest Labs' claims

The model matters for two reasons. First, it brings the FLUX.2 architecture - which couples a rectified flow transformer with Mistral-3's 24B vision-language model for world knowledge - down to a size that individual developers and small teams can actually run locally. Second, the Apache 2.0 license removes every commercial restriction that held back FLUX.1's non-commercial variants. You can fine-tune it, deploy it in production, and sell the outputs without licensing friction.

Key Specifications

Specification	Details
Provider	Black Forest Labs
Model Family	FLUX.2
Parameters	4 billion
Architecture	Rectified flow transformer + Mistral-3 24B VLM
Distillation	4-step (distilled variant)
VRAM (Distilled)	8.4 GB
VRAM (Base)	9.2 GB
VRAM (with offloading)	~13 GB on consumer GPUs
Inference Speed (GB200)	~0.3 seconds
Inference Speed (RTX 5090)	~1.2 seconds
Base Inference (GB200)	~3 seconds
Base Inference (RTX 5090)	~17 seconds
Default Resolution	1024x1024
Guidance Scale	1.0 (default)
Inference Steps	4 (distilled)
Release Date	January 15, 2026
License	Apache 2.0
Open Weights	Yes (Hugging Face)

Benchmark Performance

Metric	FLUX.2 klein 4B	FLUX.2 klein 9B	FLUX.2 Dev (32B)	Nano Banana 2
Parameters	4B	9B	32B	Not disclosed
Inference (GB200)	~0.3s	~0.5s	2-4s	4-6s
VRAM	8.4 GB	19.6 GB	80+ GB (full)	Cloud only
License	Apache 2.0	Non-commercial	Non-commercial	Proprietary
Fine-tuning	Yes	Limited	Yes (LoRA)	No
Multi-reference	Yes	Yes	Yes (up to 10)	Yes (up to 10)
Text Rendering	Basic	Improved	~60%	~90%

FLUX.2 [klein] 4B trades quality for speed. Text rendering is noticeably weaker than the larger FLUX.2 variants and far behind Nano Banana 2's ~90% accuracy. Photorealism and fine detail also drop compared to the 9B and 32B siblings. But for latency-critical workflows - real-time previews, interactive editing, batch processing - the sub-second generation time is unmatched.

Key Capabilities

Unified Architecture

Unlike previous FLUX releases that required separate models for generation and editing, FLUX.2 [klein] 4B handles text-to-image generation, image-to-image editing, and multi-reference composition in a single model. You load one set of weights and switch between modes through the prompt.

Consumer GPU Deployment

The 8.4 GB VRAM footprint (distilled) fits within the budget of an RTX 4070, RTX 3090, or any GPU with 13+ GB when using CPU offloading via enable_model_cpu_offload(). This makes it the most accessible high-quality open image model available - no cloud GPU rental required.

Distilled and Base Variants

The distilled variant (4 steps, ~0.3s on GB200) prioritizes speed. The base variant (full steps, ~3s on GB200) offers higher quality at the cost of slower inference. Both share the same weights architecture, and the base model serves as a starting point for custom fine-tuning where quality matters more than latency.

Safety Features

The model includes pre-training NSFW/CSAM filtering (partnered with IWF), post-training safety fine-tuning, pixel-layer watermarking, and C2PA cryptographic metadata signing on API outputs. Third-party adversarial testing was conducted before release.

Pricing and Availability

Access Point	Pricing	Status
Open Weights (Hugging Face)	Free	Available
GitHub Source	Free	Available
BFL API	Per-megapixel pricing	Available
ComfyUI	Free (local)	Supported
Diffusers (HuggingFace)	Free (local)	Supported
40+ HF Spaces	Free (limited)	Available

The model is fully free to download and run locally. API access through Black Forest Labs charges per megapixel of output. Third-party providers like WaveSpeed AI offer the model at $0.012 per image.

Quick Start

import torch
from diffusers import Flux2KleinPipeline

pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-4B",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A cat holding a sign that says hello world",
    height=1024, width=1024,
    guidance_scale=1.0,
    num_inference_steps=4,
    generator=torch.Generator("cuda").manual_seed(0)
).images[0]

Strengths

Sub-second generation on datacenter GPUs, ~1.2s on consumer hardware
Apache 2.0 license - no commercial restrictions, full fine-tuning rights
8.4 GB VRAM makes it accessible to mid-range consumer GPUs
Unified model for generation, editing, and multi-reference
Active community: 534 likes, 12 adapters, 11 fine-tunes, 13 quantizations on Hugging Face
Pixel-layer watermarking and C2PA support for content provenance

Weaknesses

Text rendering quality significantly below larger FLUX.2 variants and competitors
Photorealism and fine detail are visibly weaker than 9B and 32B siblings
Distilled variant (4 steps) shows quality tradeoffs vs. full-step base
Limited maximum resolution compared to FLUX.2 Max's 4MP output
May amplify biases from training data
No web-grounded generation (unlike Nano Banana 2's Gemini integration)

Best AI Image Generators 2026 - Full market comparison including FLUX.2
Nano Banana 2 - Google's competing image generation model

Sources: