ComfyUI: The Node-Based Workflow That Power Users Actually Run

What is ComfyUI in 2026?

ComfyUI is a node-based interface for running diffusion models. You build a graph — each node is a step (load checkpoint → CLIP encode → KSampler → VAE decode → save image) — and that graph becomes a reproducible, shareable pipeline. By 2026 it has become the de-facto pro environment for Stable Diffusion 3.5, FLUX.2, HiDream-I1, video models, and anything experimental.

The reason power users moved here from Automatic1111 and Forge: ComfyUI exposes every step of the diffusion process. You can branch, route, compose, and iterate in ways the simpler UIs can't. Sharing a workflow is as simple as dragging a PNG — the workflow graph is embedded in the file.

Why Node-Based Wins for Serious Work

Composability — Combine techniques (Img2Img, ControlNet, IP-Adapter, LoRA stacking) without UI bolt-ons.
Reproducibility — A workflow PNG is the entire pipeline. Drag it in, get identical results.
Speed — ComfyUI only recomputes nodes downstream of changes. Iterate on the sampler without re-encoding the prompt.
Model coverage — New models (FLUX.2, HiDream, Lumina, Stable Video Diffusion, LTX-Video) usually land in ComfyUI custom nodes within days of release.
Automation — Run workflows from the API or queue 1000 variations overnight.

Installation (May 2026)

The simplest path:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt
python main.py

On Windows, use the official portable build — it bundles Python and dependencies. On macOS with Apple Silicon, the MPS backend works but is slower than NVIDIA. The ComfyUI Desktop app shipped in 2026 for one-click install on macOS and Windows; it manages updates, models, and custom nodes through a GUI.

For models, drop checkpoints into models/checkpoints, LoRAs into models/loras, VAEs into models/vae, and ControlNet models into models/controlnet.

The Core Workflow

The baseline text-to-image graph has 5 nodes:

Load Checkpoint — Pick a model (e.g. FLUX.2 dev).
CLIP Text Encode (Positive) — Your prompt.
CLIP Text Encode (Negative) — What to avoid (less needed with FLUX).
KSampler — Steps, CFG, sampler, scheduler, seed.
VAE Decode → Save Image — Decode latent to pixels, save.

That's it. Every advanced technique adds nodes around this base.

Essential Custom Nodes

ComfyUI-Manager — Manages custom nodes and models from a GUI. Install this first.
ComfyUI-AnimateDiff-Evolved / WanVideoWrapper — Video generation from SD/FLUX models.
ComfyUI-Impact-Pack — Detection, segmentation, upscaling pipelines.
ComfyUI-Inspire-Pack — Advanced sampling, prompt scheduling, regional prompts.
ControlNet Auxiliary Preprocessors — Canny, depth, pose, OpenPose, lineart for every ControlNet workflow.
rgthree-comfy — Quality-of-life: muting groups, fast bypass, seed reuse.
ComfyUI_IPAdapter_plus — Style and face transfer via IP-Adapter for SD and FLUX.
ComfyUI-Florence2 — On-graph image captioning for tagged datasets.

FLUX.2 in ComfyUI

FLUX.2 dev and FLUX.2 klein run natively. FLUX prompts respond best to natural-language descriptions, not weighted comma tags — write "A 1970s polaroid of a man holding a cassette tape, soft warm tones, slight motion blur" rather than tag stacks. FLUX usually wants CFG 3-5 and 25-30 steps with the euler sampler and simple scheduler. Negative prompts mostly don't help.

For higher quality, use the Two-Pass pattern: generate at 1024×1024, then run a second pass through Img2Img at 1.5× resolution with denoise 0.4-0.5 to upscale and refine.

ControlNet Workflows

ControlNet lets you condition generation on structure: a pose, a depth map, a Canny edge, a sketch. The graph adds:

Load Image → Preprocessor (e.g. DWPose, Depth Anything V2) → Save preview
Apply ControlNet node taking the preprocessed image + the loaded ControlNet model + the conditioning
Hand the modified conditioning to KSampler

This is how AI portrait studios reproduce a client's exact pose across many style variations. Pair with IP-Adapter for face consistency to get a full character pipeline.

Video in ComfyUI

The video models that landed in 2026 (LTX-Video, Wan-2, CogVideoX-5B-Lite, Mochi-1) all have ComfyUI custom nodes. Workflows follow the same pattern as image: load model → text encode → sampler → VAE decode → save as MP4. Hardware reality: 16GB+ VRAM minimum, 24GB+ for any meaningful quality.

API & Headless Use

ComfyUI exposes an HTTP API. POST a workflow JSON, get back generated images. Production setups put ComfyUI behind a queue worker (Redis + a small FastAPI wrapper) and run multiple GPU workers. Companies running consumer image-gen products often have ComfyUI under the hood.

When Not to Use ComfyUI

You're new to diffusion — Start in Forge or Invoke; the node graph is overwhelming without context.
You don't need control — If you just want pretty images, Midjourney is faster and simpler.
You don't own a GPU — Cloud ComfyUI (RunPod, Mimic PC, ComfyDeploy) works, but at that point a hosted UI is usually easier.

If you've outgrown the simple UIs and want full control over your pipeline, ComfyUI is the answer. Plan to spend a weekend learning the node graph, then a lifetime mastering it.