FLUX.2 [klein] 4B

Black Forest Labs' fastest open-source image generation model - 4B parameters, Apache 2.0 license, sub-second generation on consumer GPUs with 13GB VRAM.

FLUX.2 [klein] 4B

FLUX.2 [klein] 4B is the smallest and fastest model in Black Forest Labs' second-generation image family. At 4 billion parameters with 4-step distilled inference, it generates images in under one second on an NVIDIA GB200 and roughly 1.2 seconds on a consumer RTX 5090. It fits in 8.4 GB of VRAM (distilled) or 9.2 GB (base), making it the first model in the FLUX.2 lineup that runs comfortably on mid-range consumer GPUs. And it ships under Apache 2.0 - fully open for commercial use.

TL;DR

  • 4B parameter rectified flow transformer, distilled to 4 inference steps for sub-second generation
  • Apache 2.0 license - fully open weights, commercial use allowed, fine-tuning supported
  • 8.4 GB VRAM (distilled), runs on RTX 3090/4070 and above (~13 GB with CPU offloading)
  • Unified architecture: text-to-image, image-to-image editing, and multi-reference in one model
  • 30%+ faster than any competing model at comparable quality per Black Forest Labs' claims

The model matters for two reasons. First, it brings the FLUX.2 architecture - which couples a rectified flow transformer with Mistral-3's 24B vision-language model for world knowledge - down to a size that individual developers and small teams can actually run locally. Second, the Apache 2.0 license removes every commercial restriction that held back FLUX.1's non-commercial variants. You can fine-tune it, deploy it in production, and sell the outputs without licensing friction.

Key Specifications

SpecificationDetails
ProviderBlack Forest Labs
Model FamilyFLUX.2
Parameters4 billion
ArchitectureRectified flow transformer + Mistral-3 24B VLM
Distillation4-step (distilled variant)
VRAM (Distilled)8.4 GB
VRAM (Base)9.2 GB
VRAM (with offloading)~13 GB on consumer GPUs
Inference Speed (GB200)~0.3 seconds
Inference Speed (RTX 5090)~1.2 seconds
Base Inference (GB200)~3 seconds
Base Inference (RTX 5090)~17 seconds
Default Resolution1024x1024
Guidance Scale1.0 (default)
Inference Steps4 (distilled)
Release DateJanuary 15, 2026
LicenseApache 2.0
Open WeightsYes (Hugging Face)

Benchmark Performance

MetricFLUX.2 klein 4BFLUX.2 klein 9BFLUX.2 Dev (32B)Nano Banana 2
Parameters4B9B32BNot disclosed
Inference (GB200)~0.3s~0.5s2-4s4-6s
VRAM8.4 GB19.6 GB80+ GB (full)Cloud only
LicenseApache 2.0Non-commercialNon-commercialProprietary
Fine-tuningYesLimitedYes (LoRA)No
Multi-referenceYesYesYes (up to 10)Yes (up to 10)
Text RenderingBasicImproved~60%~90%

FLUX.2 [klein] 4B trades quality for speed. Text rendering is noticeably weaker than the larger FLUX.2 variants and far behind Nano Banana 2's ~90% accuracy. Photorealism and fine detail also drop compared to the 9B and 32B siblings. But for latency-critical workflows - real-time previews, interactive editing, batch processing - the sub-second generation time is unmatched.

Key Capabilities

Unified Architecture

Unlike previous FLUX releases that required separate models for generation and editing, FLUX.2 [klein] 4B handles text-to-image generation, image-to-image editing, and multi-reference composition in a single model. You load one set of weights and switch between modes through the prompt.

Consumer GPU Deployment

The 8.4 GB VRAM footprint (distilled) fits within the budget of an RTX 4070, RTX 3090, or any GPU with 13+ GB when using CPU offloading via enable_model_cpu_offload(). This makes it the most accessible high-quality open image model available - no cloud GPU rental required.

Distilled and Base Variants

The distilled variant (4 steps, ~0.3s on GB200) prioritizes speed. The base variant (full steps, ~3s on GB200) offers higher quality at the cost of slower inference. Both share the same weights architecture, and the base model serves as a starting point for custom fine-tuning where quality matters more than latency.

Safety Features

The model includes pre-training NSFW/CSAM filtering (partnered with IWF), post-training safety fine-tuning, pixel-layer watermarking, and C2PA cryptographic metadata signing on API outputs. Third-party adversarial testing was conducted before release.

Pricing and Availability

Access PointPricingStatus
Open Weights (Hugging Face)FreeAvailable
GitHub SourceFreeAvailable
BFL APIPer-megapixel pricingAvailable
ComfyUIFree (local)Supported
Diffusers (HuggingFace)Free (local)Supported
40+ HF SpacesFree (limited)Available

The model is fully free to download and run locally. API access through Black Forest Labs charges per megapixel of output. Third-party providers like WaveSpeed AI offer the model at $0.012 per image.

Quick Start

import torch
from diffusers import Flux2KleinPipeline

pipe = Flux2KleinPipeline.from_pretrained(
    "black-forest-labs/FLUX.2-klein-4B",
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A cat holding a sign that says hello world",
    height=1024, width=1024,
    guidance_scale=1.0,
    num_inference_steps=4,
    generator=torch.Generator("cuda").manual_seed(0)
).images[0]

Strengths

  • Sub-second generation on datacenter GPUs, ~1.2s on consumer hardware
  • Apache 2.0 license - no commercial restrictions, full fine-tuning rights
  • 8.4 GB VRAM makes it accessible to mid-range consumer GPUs
  • Unified model for generation, editing, and multi-reference
  • Active community: 534 likes, 12 adapters, 11 fine-tunes, 13 quantizations on Hugging Face
  • Pixel-layer watermarking and C2PA support for content provenance

Weaknesses

  • Text rendering quality significantly below larger FLUX.2 variants and competitors
  • Photorealism and fine detail are visibly weaker than 9B and 32B siblings
  • Distilled variant (4 steps) shows quality tradeoffs vs. full-step base
  • Limited maximum resolution compared to FLUX.2 Max's 4MP output
  • May amplify biases from training data
  • No web-grounded generation (unlike Nano Banana 2's Gemini integration)

Sources:

✓ Last verified March 14, 2026

FLUX.2 [klein] 4B
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.