ImageAdvancedStable Diffusion

Stable Diffusion: Local Installation and Usage

Run Stable Diffusion 3.5 and FLUX.2 locally for unlimited free image generation. Hardware requirements, ComfyUI vs Forge vs Invoke, and how to set up the most powerful models.

April 5, 2026·4 min read
Share:
Stable Diffusion: Local Installation and Usage

Why Run Locally in 2026?

Stable Diffusion and FLUX are the leading open-source image generation ecosystems. Running them on your own hardware delivers four advantages no subscription tool matches:

  • Unlimited generation — once hardware is paid off, infinite output
  • Privacy — nothing leaves your machine
  • No content filters — generate anything legal without service-level censorship
  • Customization — every parameter, model, and workflow is yours

The trade-off: capable GPU + technical setup. Worth it for serious users.

Hardware Requirements (Updated for 2026 Models)

  • Minimum — NVIDIA GPU with 8GB VRAM (e.g., RTX 3060). Apple Silicon (M2/M3/M4) works via MPS but slower. Runs SDXL and SD 3.5 Medium.
  • Recommended — RTX 4070 Ti / 4080 with 12-16GB VRAM. Runs FLUX.1 dev quantized, SD 3.5 Large, fast iteration.
  • Best — RTX 4090 / 5080 / 5090 (24-32GB). Runs FLUX.2 Pro at full quality, batch generation, training your own LoRAs.
  • RAM: 32GB recommended (16GB minimum)
  • Storage: 200GB+ free for models + outputs. Each FLUX checkpoint is 12-23GB.

Models to Download (May 2026)

FLUX.2 (Black Forest Labs)

The current state-of-the-art for photorealistic image generation. Lineup:

  • FLUX.2 [pro] — Best quality, paid API only
  • FLUX.2 [dev] — Open weights, research/personal use only
  • FLUX.2 [klein] — Lighter version, runs on smaller GPUs
  • FLUX.2 [flex] — Optimized for editing tasks
  • FLUX.2 [max] — Highest fidelity, requires 24GB+ VRAM

Features: native 4MP resolution, up to 10 reference images for character/style consistency, top-tier typography, conversational FLUX Kontext for image editing.

Stable Diffusion 3.5

Stability AI's flagship open model. Replaces SDXL as the recommended SD checkpoint.

  • SD 3.5 Large — 8B parameters, best quality
  • SD 3.5 Medium — Balanced for prosumer GPUs
  • SD 3.5 Turbo — Fast generation

Specialized Models

  • HiDream-I1 — 17B parameter model, often beats SDXL/DALL-E 3/FLUX.1 on benchmarks
  • JuggernautXL / RealVisXL — Photorealism-tuned SDXL forks
  • Pony Diffusion XL — Anime/illustration

Where to Get Models

  • Civitai.com — Largest community for SD models, LoRAs, embeddings. Browse by style, content type, base model.
  • Hugging Face — Original model hosting. Best for foundation models and research releases.
  • Black Forest Labs (HF) — Official FLUX releases

Installation Options

Node-based interface. Most flexible, fastest, best support for new models including FLUX.2. The de-facto standard for serious users in 2026.

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py

Place models in models/checkpoints/, LoRAs in models/loras/. Browse to http://127.0.0.1:8188.

2. Forge / AUTOMATIC1111

Web UI with extensive features and large extension ecosystem. Forge is a faster fork of A1111. Best for users coming from older Stable Diffusion workflows.

git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
cd stable-diffusion-webui-forge
./webui.sh   # or webui-user.bat on Windows

3. InvokeAI

Polished commercial-friendly UI with a non-destructive editing canvas. Excellent for production work and teams.

pip install InvokeAI
invokeai-configure
invokeai

4. Pinokio (Easy Install)

One-click installer for ComfyUI, Forge, and dozens of other AI tools. Great for beginners — installs dependencies automatically.

Key Concepts

Checkpoints

The base model files (8-23GB). Different checkpoints are trained for different styles — photorealism, anime, art, specific aesthetics.

LoRAs

Small (~150-500MB) style/character/concept add-ons. Stack multiple LoRAs to combine effects: photorealistic checkpoint + watercolor LoRA + character LoRA.

Sampling Methods

Algorithms for the diffusion process. Euler a for speed, DPM++ 2M Karras for quality, UniPC for FLUX. 25-30 steps usually optimal.

CFG Scale

How strictly to follow the prompt (1-30). 6-9 sweet spot for most models. FLUX often prefers 3-5.

Advanced Workflow

  • img2img — Use an existing image as starting point. Great for refinements and style transfer.
  • Inpainting — Mask and regenerate specific areas. Perfect for fixing hands, removing objects, swapping faces.
  • ControlNet — Constrain generation with pose, depth, edges, or composition from a reference. Game-changer for consistency.
  • FLUX Kontext — Conversational image editing — describe what to change, get the result without losing composition.
  • Upscaling — ESRGAN, SUPIR, or model-specific upscalers push outputs to 8K+.
  • LoRA training — With ~20-50 reference images and a few hours of GPU time, train SD/FLUX on specific characters or styles.

FLUX vs Stable Diffusion 3.5: Which to Choose?

FLUX.2 wins for: photorealism, prompt adherence, typography, modern aesthetic. Use as primary for production.

SD 3.5 wins for: ecosystem (Civitai LoRAs), training custom models, cost-sensitive workflows on smaller GPUs.

Reality: most production rigs run both — FLUX for hero images and realism, SD for stylized work and rapid iteration.

Tips for Effective Local Generation

  • Use --xformers or --sdp attention for ~30% speed boost on NVIDIA
  • Generate at native resolution then upscale — beats trying to generate at very high resolution directly
  • Build a "lookbook" folder of your best prompts — copy variants, don't recreate
  • For FLUX, low CFG (3-5) and 25-30 steps usually optimal
  • For SD, denoise 0.4-0.6 in img2img preserves composition while changing style