Stable Diffusion: Local Installation and Usage

Why Run Locally in 2026?

Stable Diffusion and FLUX are the leading open-source image generation ecosystems. Running them on your own hardware delivers four advantages no subscription tool matches:

Unlimited generation — once hardware is paid off, infinite output
Privacy — nothing leaves your machine
No content filters — generate anything legal without service-level censorship
Customization — every parameter, model, and workflow is yours

The trade-off: capable GPU + technical setup. Worth it for serious users.

Hardware Requirements (Updated for 2026 Models)

Minimum — NVIDIA GPU with 8GB VRAM (e.g., RTX 3060). Apple Silicon (M2/M3/M4) works via MPS but slower. Runs SDXL and SD 3.5 Medium.
Recommended — RTX 4070 Ti / 4080 with 12-16GB VRAM. Runs FLUX.1 dev quantized, SD 3.5 Large, fast iteration.
Best — RTX 4090 / 5080 / 5090 (24-32GB). Runs FLUX.2 Pro at full quality, batch generation, training your own LoRAs.
RAM: 32GB recommended (16GB minimum)
Storage: 200GB+ free for models + outputs. Each FLUX checkpoint is 12-23GB.

Models to Download (May 2026)

FLUX.2 (Black Forest Labs)

The current state-of-the-art for photorealistic image generation. Lineup:

FLUX.2 [pro] — Best quality, paid API only
FLUX.2 [dev] — Open weights, research/personal use only
FLUX.2 [klein] — Lighter version, runs on smaller GPUs
FLUX.2 [flex] — Optimized for editing tasks
FLUX.2 [max] — Highest fidelity, requires 24GB+ VRAM

Features: native 4MP resolution, up to 10 reference images for character/style consistency, top-tier typography, conversational FLUX Kontext for image editing.

Stable Diffusion 3.5

Stability AI's flagship open model. Replaces SDXL as the recommended SD checkpoint.

SD 3.5 Large — 8B parameters, best quality
SD 3.5 Medium — Balanced for prosumer GPUs
SD 3.5 Turbo — Fast generation

Specialized Models

HiDream-I1 — 17B parameter model, often beats SDXL/DALL-E 3/FLUX.1 on benchmarks
JuggernautXL / RealVisXL — Photorealism-tuned SDXL forks
Pony Diffusion XL — Anime/illustration

Where to Get Models

Civitai.com — Largest community for SD models, LoRAs, embeddings. Browse by style, content type, base model.
Hugging Face — Original model hosting. Best for foundation models and research releases.
Black Forest Labs (HF) — Official FLUX releases

Installation Options

1. ComfyUI (Recommended for Power Users)

Node-based interface. Most flexible, fastest, best support for new models including FLUX.2. The de-facto standard for serious users in 2026.

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py

Place models in models/checkpoints/, LoRAs in models/loras/. Browse to http://127.0.0.1:8188.

2. Forge / AUTOMATIC1111

Web UI with extensive features and large extension ecosystem. Forge is a faster fork of A1111. Best for users coming from older Stable Diffusion workflows.

git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge
./webui.sh   # or webui-user.bat on Windows

3. InvokeAI

Polished commercial-friendly UI with a non-destructive editing canvas. Excellent for production work and teams.

pip install InvokeAI
invokeai-configure
invokeai

4. Pinokio (Easy Install)

One-click installer for ComfyUI, Forge, and dozens of other AI tools. Great for beginners — installs dependencies automatically.

Key Concepts

Checkpoints

The base model files (8-23GB). Different checkpoints are trained for different styles — photorealism, anime, art, specific aesthetics.

LoRAs

Small (~150-500MB) style/character/concept add-ons. Stack multiple LoRAs to combine effects: photorealistic checkpoint + watercolor LoRA + character LoRA.

Sampling Methods

Algorithms for the diffusion process. Euler a for speed, DPM++ 2M Karras for quality, UniPC for FLUX. 25-30 steps usually optimal.

CFG Scale

How strictly to follow the prompt (1-30). 6-9 sweet spot for most models. FLUX often prefers 3-5.

Advanced Workflow

img2img — Use an existing image as starting point. Great for refinements and style transfer.
Inpainting — Mask and regenerate specific areas. Perfect for fixing hands, removing objects, swapping faces.
ControlNet — Constrain generation with pose, depth, edges, or composition from a reference. Game-changer for consistency.
FLUX Kontext — Conversational image editing — describe what to change, get the result without losing composition.
Upscaling — ESRGAN, SUPIR, or model-specific upscalers push outputs to 8K+.
LoRA training — With ~20-50 reference images and a few hours of GPU time, train SD/FLUX on specific characters or styles.

FLUX vs Stable Diffusion 3.5: Which to Choose?

FLUX.2 wins for: photorealism, prompt adherence, typography, modern aesthetic. Use as primary for production.

SD 3.5 wins for: ecosystem (Civitai LoRAs), training custom models, cost-sensitive workflows on smaller GPUs.

Reality: most production rigs run both — FLUX for hero images and realism, SD for stylized work and rapid iteration.

Tips for Effective Local Generation

Use --xformers or --sdp attention for ~30% speed boost on NVIDIA
Generate at native resolution then upscale — beats trying to generate at very high resolution directly
Build a "lookbook" folder of your best prompts — copy variants, don't recreate
For FLUX, low CFG (3-5) and 25-30 steps usually optimal
For SD, denoise 0.4-0.6 in img2img preserves composition while changing style