Why Run Locally in 2026?
Stable Diffusion and FLUX are the leading open-source image generation ecosystems. Running them on your own hardware delivers four advantages no subscription tool matches:
- Unlimited generation — once hardware is paid off, infinite output
- Privacy — nothing leaves your machine
- No content filters — generate anything legal without service-level censorship
- Customization — every parameter, model, and workflow is yours
The trade-off: capable GPU + technical setup. Worth it for serious users.
Hardware Requirements (Updated for 2026 Models)
- Minimum — NVIDIA GPU with 8GB VRAM (e.g., RTX 3060). Apple Silicon (M2/M3/M4) works via MPS but slower. Runs SDXL and SD 3.5 Medium.
- Recommended — RTX 4070 Ti / 4080 with 12-16GB VRAM. Runs FLUX.1 dev quantized, SD 3.5 Large, fast iteration.
- Best — RTX 4090 / 5080 / 5090 (24-32GB). Runs FLUX.2 Pro at full quality, batch generation, training your own LoRAs.
- RAM: 32GB recommended (16GB minimum)
- Storage: 200GB+ free for models + outputs. Each FLUX checkpoint is 12-23GB.
Models to Download (May 2026)
FLUX.2 (Black Forest Labs)
The current state-of-the-art for photorealistic image generation. Lineup:
- FLUX.2 [pro] — Best quality, paid API only
- FLUX.2 [dev] — Open weights, research/personal use only
- FLUX.2 [klein] — Lighter version, runs on smaller GPUs
- FLUX.2 [flex] — Optimized for editing tasks
- FLUX.2 [max] — Highest fidelity, requires 24GB+ VRAM
Features: native 4MP resolution, up to 10 reference images for character/style consistency, top-tier typography, conversational FLUX Kontext for image editing.
Stable Diffusion 3.5
Stability AI's flagship open model. Replaces SDXL as the recommended SD checkpoint.
- SD 3.5 Large — 8B parameters, best quality
- SD 3.5 Medium — Balanced for prosumer GPUs
- SD 3.5 Turbo — Fast generation
Specialized Models
- HiDream-I1 — 17B parameter model, often beats SDXL/DALL-E 3/FLUX.1 on benchmarks
- JuggernautXL / RealVisXL — Photorealism-tuned SDXL forks
- Pony Diffusion XL — Anime/illustration
Where to Get Models
- Civitai.com — Largest community for SD models, LoRAs, embeddings. Browse by style, content type, base model.
- Hugging Face — Original model hosting. Best for foundation models and research releases.
- Black Forest Labs (HF) — Official FLUX releases
Installation Options
1. ComfyUI (Recommended for Power Users)
Node-based interface. Most flexible, fastest, best support for new models including FLUX.2. The de-facto standard for serious users in 2026.
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py
Place models in models/checkpoints/, LoRAs in models/loras/. Browse to http://127.0.0.1:8188.
2. Forge / AUTOMATIC1111
Web UI with extensive features and large extension ecosystem. Forge is a faster fork of A1111. Best for users coming from older Stable Diffusion workflows.
git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge
./webui.sh # or webui-user.bat on Windows
3. InvokeAI
Polished commercial-friendly UI with a non-destructive editing canvas. Excellent for production work and teams.
pip install InvokeAI
invokeai-configure
invokeai
4. Pinokio (Easy Install)
One-click installer for ComfyUI, Forge, and dozens of other AI tools. Great for beginners — installs dependencies automatically.
Key Concepts
Checkpoints
The base model files (8-23GB). Different checkpoints are trained for different styles — photorealism, anime, art, specific aesthetics.
LoRAs
Small (~150-500MB) style/character/concept add-ons. Stack multiple LoRAs to combine effects: photorealistic checkpoint + watercolor LoRA + character LoRA.
Sampling Methods
Algorithms for the diffusion process. Euler a for speed, DPM++ 2M Karras for quality, UniPC for FLUX. 25-30 steps usually optimal.
CFG Scale
How strictly to follow the prompt (1-30). 6-9 sweet spot for most models. FLUX often prefers 3-5.
Advanced Workflow
- img2img — Use an existing image as starting point. Great for refinements and style transfer.
- Inpainting — Mask and regenerate specific areas. Perfect for fixing hands, removing objects, swapping faces.
- ControlNet — Constrain generation with pose, depth, edges, or composition from a reference. Game-changer for consistency.
- FLUX Kontext — Conversational image editing — describe what to change, get the result without losing composition.
- Upscaling — ESRGAN, SUPIR, or model-specific upscalers push outputs to 8K+.
- LoRA training — With ~20-50 reference images and a few hours of GPU time, train SD/FLUX on specific characters or styles.
FLUX vs Stable Diffusion 3.5: Which to Choose?
FLUX.2 wins for: photorealism, prompt adherence, typography, modern aesthetic. Use as primary for production.
SD 3.5 wins for: ecosystem (Civitai LoRAs), training custom models, cost-sensitive workflows on smaller GPUs.
Reality: most production rigs run both — FLUX for hero images and realism, SD for stylized work and rapid iteration.
Tips for Effective Local Generation
- Use
--xformersor--sdpattention for ~30% speed boost on NVIDIA - Generate at native resolution then upscale — beats trying to generate at very high resolution directly
- Build a "lookbook" folder of your best prompts — copy variants, don't recreate
- For FLUX, low CFG (3-5) and 25-30 steps usually optimal
- For SD, denoise 0.4-0.6 in img2img preserves composition while changing style