Black Forest Labs has released FLUX.2 [klein], a model small enough to fine-tune on a single consumer GPU. According to the company, a LoRA run on the 4B model fits in 24 GB of VRAM, takes about an hour on an RTX 4090, and costs roughly $0.50 if you rent the GPU. This guide walks users through the entire process, from building a dataset to deploying the result as a Hugging Face Space. By the end, users will have a .safetensors LoRA that teaches klein a specific style, character, look, or edit behavior, plus the few details that decide whether the result is usable or mush. Everything here uses open weights. FLUX.2-klein-base-4B is Apache 2.0, so you can ship what you train. The guide is part of the Build Small Hackathon, hosted by Gradio and Hugging Face, with Black Forest Labs among the sponsors. The build window is June 5–15, 2026.

Two rules shape what you make: the model you use must be 32B parameters or fewer, and your project ships as a Gradio app hosted on a Hugging Face Space. FLUX.2 [klein] fits the brief directly. The 4B model is well under the 32B cap, it's Apache 2.0 so you can ship whatever you build on it, and it runs on the Space's own GPU. A LoRA is how you make it yours: a specific style or edit that fits your track, whether that's solving a real problem for someone you know (the Backyard AI track) or building something deliberately strange (An Adventure in Thousand Token Wood). The rest of this guide trains that LoRA. The last section shows how to wrap it in the Gradio app you'll submit. Why klein for fine-tuning FLUX.2 [klein] ships in a 4B and a 9B size, each with a distilled (4-step) and a base (50-step) variant. For LoRA training the relevant one is base: Take the 4B model as an example: It fits. ~13 GB of weights in bf16; a LoRA run lands under 24 GB, so a 4090 or an L4 is enough.

It's the training target. Distilled models are step-compressed for fast inference; you train against the base checkpoint and the adapter still loads on the distilled model afterward — it's faster and, in our testing, usually gives even better results. If you only want to run a LoRA, you do not need to train one — you can find community klein LoRAs on the Hub already. Train when you need a specific look the existing ones don't cover. What you'll need 15–40 images that share one look (your art, licensed photos, or public-domain works from Wikimedia Commons). A GPU for ~60 minutes.

An RTX 4090 (24 GB) is the sweet spot. A trainer. This guide uses ostris/ai-toolkit, a popular community trainer with a no-code web UI. It's one of several — any klein-compatible trainer works. Pick your path ai-toolkit has a web UI, so you don't have to edit YAML by hand unless you want to. Two ways to run it: Path Best for Setup RunPod template most people, ~$0.50/run one-click deploy, UI auto-launches Local UI you have a 24 GB+ NVIDIA GPU git clone + npm run build_and_start, open localhost:8675 The dataset and caption rules below are identical across both. Ostris has a 2-minute walkthrough video if you want to see the UI first.

Source: huggingface