FLUX.2 [klein] 9B

Black Forest Labs' 9B parameter distilled image model - sub-second generation with higher quality than the 4B variant, 19.6 GB VRAM, non-commercial license.

FLUX.2 [klein] 9B

FLUX.2 [klein] 9B sits between the fully open 4B variant and the much larger 32B FLUX.2 Dev. It generates images in roughly 0.5 seconds on an NVIDIA GB200 and about 2 seconds on a consumer RTX 5090 - slower than its 4B sibling but noticeably sharper on fine details and text. The tradeoff: it requires 19.6 GB of VRAM (distilled) and ships under a non-commercial license.

TL;DR

  • 9B parameter rectified flow transformer, distilled to 4 steps for sub-second datacenter inference
  • ~0.5s on GB200, ~2s on RTX 5090 - 40% slower than 4B but visibly better quality
  • 19.6 GB VRAM (distilled), 21.7 GB (base) - needs an RTX 4090 or better for local use
  • Non-commercial license (FLUX Non-Commercial License) - no production deployment without a deal
  • Same unified architecture: text-to-image, editing, and multi-reference in one model

Key Specifications

SpecificationDetails
ProviderBlack Forest Labs
Model FamilyFLUX.2
Parameters9 billion
ArchitectureRectified flow transformer + Mistral-3 24B VLM
Distillation4-step (distilled variant)
VRAM (Distilled)19.6 GB
VRAM (Base)21.7 GB
Inference Speed (GB200)~0.5 seconds
Inference Speed (RTX 5090)~2 seconds
Base Inference (GB200)~6 seconds
Base Inference (RTX 5090)~35 seconds
Default Resolution1024x1024
Release DateJanuary 15, 2026
LicenseFLUX Non-Commercial License
Open WeightsYes (Hugging Face, non-commercial)

Benchmark Performance

MetricFLUX.2 klein 4BFLUX.2 klein 9BFLUX.2 Dev (32B)Nano Banana 2
Parameters4B9B32BNot disclosed
Inference (GB200)~0.3s~0.5s2-4s4-6s
VRAM (Distilled)8.4 GB19.6 GB80+ GBCloud only
LicenseApache 2.0Non-commercialNon-commercialProprietary
Fine-tuningYesLimitedYes (LoRA)No
Text RenderingBasicImproved~60%~90%
Quality TierGoodVery GoodExcellentExcellent

The 9B model occupies a specific niche: researchers and hobbyists who want better quality than the 4B but cannot run the full 32B Dev model locally. The non-commercial license limits its appeal for startups and production use cases, where the Apache-licensed 4B or the API-only Pro/Max variants are more practical choices.

Key Capabilities

Quality Step-Up from 4B

The additional 5 billion parameters deliver visible improvements in fine detail rendering, texture consistency, and text legibility. Character faces show fewer artifacts, fabric textures resolve more cleanly, and simple text prompts succeed more reliably. These gains are most apparent in portrait-style images and scenes with complex material interactions.

Unified Architecture

Like all FLUX.2 [klein] variants, the 9B model handles text-to-image generation, image-to-image editing, and multi-reference composition in a single set of weights. No model switching required between tasks.

Base Model for Fine-Tuning

The undistilled base variant (21.7 GB VRAM, ~6s on GB200) serves as a higher-quality starting point for custom fine-tuning. The slower inference is acceptable in training pipelines where output quality matters more than latency. Note that fine-tuning rights are subject to the non-commercial license terms.

Pricing and Availability

Access PointPricingStatus
Open Weights (Hugging Face)Free (non-commercial)Available
BFL APIPer-megapixel pricingAvailable
ComfyUIFree (local, non-commercial)Supported
Diffusers (HuggingFace)Free (local, non-commercial)Supported

Strengths

  • Sub-second datacenter inference with noticeably better quality than the 4B variant
  • Visible improvements in face rendering, textures, and text compared to the 4B
  • Unified generation/editing/multi-reference architecture
  • Open weights available for download and local inference
  • 19.6 GB VRAM is reachable on high-end consumer GPUs (RTX 4090)
  • Safety features: watermarking, C2PA signing, NSFW filtering

Weaknesses

  • Non-commercial license blocks production deployment without separate agreement
  • 19.6 GB VRAM excludes most mid-range consumer GPUs
  • Text rendering still significantly below FLUX.2 Dev and Nano Banana 2
  • Quality gap vs. the 32B Dev model is substantial - not a substitute for production work
  • Limited community fine-tuning ecosystem due to license restrictions
  • 40% slower than the 4B variant for real-time/interactive use cases

Sources:

✓ Last verified March 14, 2026

FLUX.2 [klein] 9B
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.