ImageAdvancedComfyUI

ComfyUI: The Node-Based Workflow That Power Users Actually Run

ComfyUI is the de-facto standard for serious Stable Diffusion and FLUX.2 work in 2026. This guide covers nodes, ControlNet, video workflows, custom nodes, and how to share reproducible pipelines.

May 15, 2026·4 min read
Share:
ComfyUI: The Node-Based Workflow That Power Users Actually Run

What is ComfyUI in 2026?

ComfyUI is a node-based interface for running diffusion models. You build a graph — each node is a step (load checkpoint → CLIP encode → KSampler → VAE decode → save image) — and that graph becomes a reproducible, shareable pipeline. By 2026 it has become the de-facto pro environment for Stable Diffusion 3.5, FLUX.2, HiDream-I1, video models, and anything experimental.

The reason power users moved here from Automatic1111 and Forge: ComfyUI exposes every step of the diffusion process. You can branch, route, compose, and iterate in ways the simpler UIs can't. Sharing a workflow is as simple as dragging a PNG — the workflow graph is embedded in the file.

Why Node-Based Wins for Serious Work

  • Composability — Combine techniques (Img2Img, ControlNet, IP-Adapter, LoRA stacking) without UI bolt-ons.
  • Reproducibility — A workflow PNG is the entire pipeline. Drag it in, get identical results.
  • Speed — ComfyUI only recomputes nodes downstream of changes. Iterate on the sampler without re-encoding the prompt.
  • Model coverage — New models (FLUX.2, HiDream, Lumina, Stable Video Diffusion, LTX-Video) usually land in ComfyUI custom nodes within days of release.
  • Automation — Run workflows from the API or queue 1000 variations overnight.

Installation (May 2026)

The simplest path:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI && pip install -r requirements.txt
python main.py

On Windows, use the official portable build — it bundles Python and dependencies. On macOS with Apple Silicon, the MPS backend works but is slower than NVIDIA. The ComfyUI Desktop app shipped in 2026 for one-click install on macOS and Windows; it manages updates, models, and custom nodes through a GUI.

For models, drop checkpoints into models/checkpoints, LoRAs into models/loras, VAEs into models/vae, and ControlNet models into models/controlnet.

The Core Workflow

The baseline text-to-image graph has 5 nodes:

  1. Load Checkpoint — Pick a model (e.g. FLUX.2 dev).
  2. CLIP Text Encode (Positive) — Your prompt.
  3. CLIP Text Encode (Negative) — What to avoid (less needed with FLUX).
  4. KSampler — Steps, CFG, sampler, scheduler, seed.
  5. VAE Decode → Save Image — Decode latent to pixels, save.

That's it. Every advanced technique adds nodes around this base.

Essential Custom Nodes

  • ComfyUI-Manager — Manages custom nodes and models from a GUI. Install this first.
  • ComfyUI-AnimateDiff-Evolved / WanVideoWrapper — Video generation from SD/FLUX models.
  • ComfyUI-Impact-Pack — Detection, segmentation, upscaling pipelines.
  • ComfyUI-Inspire-Pack — Advanced sampling, prompt scheduling, regional prompts.
  • ControlNet Auxiliary Preprocessors — Canny, depth, pose, OpenPose, lineart for every ControlNet workflow.
  • rgthree-comfy — Quality-of-life: muting groups, fast bypass, seed reuse.
  • ComfyUI_IPAdapter_plus — Style and face transfer via IP-Adapter for SD and FLUX.
  • ComfyUI-Florence2 — On-graph image captioning for tagged datasets.

FLUX.2 in ComfyUI

FLUX.2 dev and FLUX.2 klein run natively. FLUX prompts respond best to natural-language descriptions, not weighted comma tags — write "A 1970s polaroid of a man holding a cassette tape, soft warm tones, slight motion blur" rather than tag stacks. FLUX usually wants CFG 3-5 and 25-30 steps with the euler sampler and simple scheduler. Negative prompts mostly don't help.

For higher quality, use the Two-Pass pattern: generate at 1024×1024, then run a second pass through Img2Img at 1.5× resolution with denoise 0.4-0.5 to upscale and refine.

ControlNet Workflows

ControlNet lets you condition generation on structure: a pose, a depth map, a Canny edge, a sketch. The graph adds:

  1. Load Image → Preprocessor (e.g. DWPose, Depth Anything V2) → Save preview
  2. Apply ControlNet node taking the preprocessed image + the loaded ControlNet model + the conditioning
  3. Hand the modified conditioning to KSampler

This is how AI portrait studios reproduce a client's exact pose across many style variations. Pair with IP-Adapter for face consistency to get a full character pipeline.

Video in ComfyUI

The video models that landed in 2026 (LTX-Video, Wan-2, CogVideoX-5B-Lite, Mochi-1) all have ComfyUI custom nodes. Workflows follow the same pattern as image: load model → text encode → sampler → VAE decode → save as MP4. Hardware reality: 16GB+ VRAM minimum, 24GB+ for any meaningful quality.

API & Headless Use

ComfyUI exposes an HTTP API. POST a workflow JSON, get back generated images. Production setups put ComfyUI behind a queue worker (Redis + a small FastAPI wrapper) and run multiple GPU workers. Companies running consumer image-gen products often have ComfyUI under the hood.

When Not to Use ComfyUI

  • You're new to diffusion — Start in Forge or Invoke; the node graph is overwhelming without context.
  • You don't need control — If you just want pretty images, Midjourney is faster and simpler.
  • You don't own a GPU — Cloud ComfyUI (RunPod, Mimic PC, ComfyDeploy) works, but at that point a hosted UI is usually easier.

If you've outgrown the simple UIs and want full control over your pipeline, ComfyUI is the answer. Plan to spend a weekend learning the node graph, then a lifetime mastering it.