What is ComfyUI in 2026?
ComfyUI is a node-based interface for running diffusion models. You build a graph — each node is a step (load checkpoint → CLIP encode → KSampler → VAE decode → save image) — and that graph becomes a reproducible, shareable pipeline. By 2026 it has become the de-facto pro environment for Stable Diffusion 3.5, FLUX.2, HiDream-I1, video models, and anything experimental.
The reason power users moved here from Automatic1111 and Forge: ComfyUI exposes every step of the diffusion process. You can branch, route, compose, and iterate in ways the simpler UIs can't. Sharing a workflow is as simple as dragging a PNG — the workflow graph is embedded in the file.
Why Node-Based Wins for Serious Work
- Composability — Combine techniques (Img2Img, ControlNet, IP-Adapter, LoRA stacking) without UI bolt-ons.
- Reproducibility — A workflow PNG is the entire pipeline. Drag it in, get identical results.
- Speed — ComfyUI only recomputes nodes downstream of changes. Iterate on the sampler without re-encoding the prompt.
- Model coverage — New models (FLUX.2, HiDream, Lumina, Stable Video Diffusion, LTX-Video) usually land in ComfyUI custom nodes within days of release.
- Automation — Run workflows from the API or queue 1000 variations overnight.
Installation (May 2026)
The simplest path:
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt
python main.py
On Windows, use the official portable build — it bundles Python and dependencies. On macOS with Apple Silicon, the MPS backend works but is slower than NVIDIA. The ComfyUI Desktop app shipped in 2026 for one-click install on macOS and Windows; it manages updates, models, and custom nodes through a GUI.
For models, drop checkpoints into models/checkpoints, LoRAs into models/loras, VAEs into models/vae, and ControlNet models into models/controlnet.
The Core Workflow
The baseline text-to-image graph has 5 nodes:
- Load Checkpoint — Pick a model (e.g. FLUX.2 dev).
- CLIP Text Encode (Positive) — Your prompt.
- CLIP Text Encode (Negative) — What to avoid (less needed with FLUX).
- KSampler — Steps, CFG, sampler, scheduler, seed.
- VAE Decode → Save Image — Decode latent to pixels, save.
That's it. Every advanced technique adds nodes around this base.
Essential Custom Nodes
- ComfyUI-Manager — Manages custom nodes and models from a GUI. Install this first.
- ComfyUI-AnimateDiff-Evolved / WanVideoWrapper — Video generation from SD/FLUX models.
- ComfyUI-Impact-Pack — Detection, segmentation, upscaling pipelines.
- ComfyUI-Inspire-Pack — Advanced sampling, prompt scheduling, regional prompts.
- ControlNet Auxiliary Preprocessors — Canny, depth, pose, OpenPose, lineart for every ControlNet workflow.
- rgthree-comfy — Quality-of-life: muting groups, fast bypass, seed reuse.
- ComfyUI_IPAdapter_plus — Style and face transfer via IP-Adapter for SD and FLUX.
- ComfyUI-Florence2 — On-graph image captioning for tagged datasets.
FLUX.2 in ComfyUI
FLUX.2 dev and FLUX.2 klein run natively. FLUX prompts respond best to natural-language descriptions, not weighted comma tags — write "A 1970s polaroid of a man holding a cassette tape, soft warm tones, slight motion blur" rather than tag stacks. FLUX usually wants CFG 3-5 and 25-30 steps with the euler sampler and simple scheduler. Negative prompts mostly don't help.
For higher quality, use the Two-Pass pattern: generate at 1024×1024, then run a second pass through Img2Img at 1.5× resolution with denoise 0.4-0.5 to upscale and refine.
ControlNet Workflows
ControlNet lets you condition generation on structure: a pose, a depth map, a Canny edge, a sketch. The graph adds:
- Load Image → Preprocessor (e.g. DWPose, Depth Anything V2) → Save preview
- Apply ControlNet node taking the preprocessed image + the loaded ControlNet model + the conditioning
- Hand the modified conditioning to KSampler
This is how AI portrait studios reproduce a client's exact pose across many style variations. Pair with IP-Adapter for face consistency to get a full character pipeline.
Video in ComfyUI
The video models that landed in 2026 (LTX-Video, Wan-2, CogVideoX-5B-Lite, Mochi-1) all have ComfyUI custom nodes. Workflows follow the same pattern as image: load model → text encode → sampler → VAE decode → save as MP4. Hardware reality: 16GB+ VRAM minimum, 24GB+ for any meaningful quality.
API & Headless Use
ComfyUI exposes an HTTP API. POST a workflow JSON, get back generated images. Production setups put ComfyUI behind a queue worker (Redis + a small FastAPI wrapper) and run multiple GPU workers. Companies running consumer image-gen products often have ComfyUI under the hood.
When Not to Use ComfyUI
- You're new to diffusion — Start in Forge or Invoke; the node graph is overwhelming without context.
- You don't need control — If you just want pretty images, Midjourney is faster and simpler.
- You don't own a GPU — Cloud ComfyUI (RunPod, Mimic PC, ComfyDeploy) works, but at that point a hosted UI is usually easier.
If you've outgrown the simple UIs and want full control over your pipeline, ComfyUI is the answer. Plan to spend a weekend learning the node graph, then a lifetime mastering it.