Self-Hosting Stable Diffusion with ComfyUI: Local AI Image Generation

AI 2026-02-15 · 6 min read ai stable-diffusion comfyui docker gpu image-generation
By Selfhosted Guides Editorial Team — Self-hosting practitioners covering open source software, home lab infrastructure, and data sovereignty.

Cloud image generation services charge per image, impose content filters you can't control, and send every prompt to someone else's server. Self-hosting Stable Diffusion eliminates all three problems. You get unlimited generations, full control over what you create, and complete privacy -- all running on your own GPU.

Photo by Lightsaber Collection on Unsplash

ComfyUI is the best way to run Stable Diffusion locally. It's a node-based workflow editor that exposes the entire diffusion pipeline as a visual graph. Instead of hiding complexity behind a single "Generate" button, ComfyUI lets you wire together CLIP text encoding, KSampler nodes, VAE decoding, ControlNet conditioning, and LoRA loading exactly how you want. That sounds intimidating, but the default workflow works out of the box -- and the node graph means you can understand and modify every step of the generation process.

Why ComfyUI Over Automatic1111

AUTOMATIC1111's Web UI (A1111) was the default Stable Diffusion interface for years. It's still popular, but ComfyUI has pulled ahead for several reasons:

Performance -- ComfyUI only re-executes nodes that changed. Edit your prompt and it skips VAE decoding from the previous run. A1111 re-runs the entire pipeline every time.
Memory efficiency -- ComfyUI uses aggressive model offloading. It can run SDXL on 6 GB VRAM cards that choke under A1111.
Workflow flexibility -- Node graphs let you build complex pipelines (img2img chains, ControlNet stacking, multi-LoRA blending) that would require extension hacks in A1111.
Reproducibility -- Workflows are JSON files. Save them, share them, version-control them. Someone else can load your exact pipeline and get identical results.
Active development -- ComfyUI supports new model architectures (Flux, SD3, Stable Cascade) faster than A1111.

The tradeoff: A1111 has a simpler interface for basic text-to-image. If you just want to type a prompt and click Generate, A1111 is more approachable. But the moment you want to do anything beyond basic generation, ComfyUI's node system is dramatically more powerful.

System Requirements

Image generation is GPU-bound. CPU inference exists but is impractically slow -- expect 10+ minutes per image instead of seconds.

Minimum (Functional)

GPU: NVIDIA card with 6 GB VRAM (GTX 1660 Super, RTX 2060)
RAM: 16 GB system memory
Storage: 20 GB for ComfyUI + one model checkpoint
OS: Linux recommended, Windows works, macOS via MPS (slower)

VRAM Guidelines

Model	Resolution	VRAM Needed	Time/Image (RTX 3060)
SD 1.5	512x512	4 GB	~3 sec
SDXL	1024x1024	6-8 GB	~8 sec
Flux Dev	1024x1024	10-12 GB	~15 sec
SD 1.5 + ControlNet	512x512	6 GB	~5 sec
SDXL + LoRA + ControlNet	1024x1024	10 GB	~12 sec

AMD GPUs work via ROCm but expect rougher edges. Intel Arc has experimental support. Apple Silicon runs through MPS -- functional but 2-3x slower than equivalent NVIDIA hardware.

Docker Deployment

The cleanest way to run ComfyUI is with Docker and NVIDIA Container Toolkit.

First, install the NVIDIA Container Toolkit and verify your GPU is visible: docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

# docker-compose.yml
services:
  comfyui:
    image: ghcr.io/ai-dock/comfyui:latest
    container_name: comfyui
    ports:
      - "8188:8188"
    volumes:
      - ./models:/opt/ComfyUI/models
      - ./output:/opt/ComfyUI/output
      - ./custom_nodes:/opt/ComfyUI/custom_nodes
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      - CLI_ARGS=--listen 0.0.0.0
    restart: unless-stopped

volumes:
  comfyui_data:

mkdir -p models/checkpoints models/loras models/controlnet models/vae output custom_nodes
docker compose up -d

Open http://your-server:8188 and you'll see the ComfyUI node editor. It ships with a default text-to-image workflow -- but you'll need to download a model checkpoint first.

Want more ai guides? Get guides like this in your inbox — Self-Hosted Weekly delivers one free deep-dive every week.

Model Management

Models are the core of image generation. You need at least one checkpoint to get started.

Downloading Your First Checkpoint

# SD 1.5 -- small, fast, huge ecosystem of LoRAs and embeddings
wget -P models/checkpoints/ \
  "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors"

# SDXL -- higher quality, higher VRAM usage
wget -P models/checkpoints/ \
  "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors"

Community fine-tunes on CivitAI and Hugging Face are where the real variety lives. Models like Realistic Vision (photorealism), DreamShaper (artistic), and Juggernaut XL (general purpose SDXL) are popular starting points.

Model Directory Structure

ComfyUI expects models in specific subdirectories:

models/
├── checkpoints/    # Main model files (.safetensors)
├── loras/          # LoRA fine-tunes (style/subject adapters)
├── controlnet/     # ControlNet models (pose, depth, canny)
├── vae/            # VAE decoders (affects color/detail)
├── embeddings/     # Textual inversions
├── upscale_models/ # Upscaler models (RealESRGAN, etc.)
└── clip/           # CLIP text encoder models

Drop files into the right directory and refresh the ComfyUI browser page. No restart needed.

Workflow Basics

ComfyUI's node graph can look overwhelming at first. Here's what the default text-to-image workflow does:

Load Checkpoint -- loads the model (UNet, CLIP, VAE) into memory
CLIP Text Encode (Positive) -- converts your prompt into embeddings the model understands
CLIP Text Encode (Negative) -- encodes things you don't want in the image
KSampler -- the denoising loop that actually generates the image from noise
VAE Decode -- converts the latent output into a visible image
Save Image -- writes the result to disk

Each node has inputs and outputs you can rewire. Want to add a LoRA? Insert a "Load LoRA" node between the checkpoint and the CLIP encoder. Want ControlNet? Add a "Load ControlNet Model" and "Apply ControlNet" node before the KSampler. The graph makes the data flow explicit.

Useful Workflow Patterns

Hires Fix: Generate at base resolution, then upscale with a second KSampler pass. Dramatically improves detail.
ControlNet Posing: Feed a reference image through a pose estimator, then condition generation on the skeleton. Consistent character poses without prompt gymnastics.
LoRA Stacking: Chain multiple LoRAs to combine styles. A "cinematic lighting" LoRA plus a "watercolor" LoRA creates interesting hybrids.
Batch Generation: Set the KSampler batch size to generate multiple variations in one pass.

Essential Custom Nodes

ComfyUI's plugin ecosystem lives in the custom_nodes directory. Install ComfyUI Manager first -- it adds a UI button for browsing and installing everything else:

cd custom_nodes && git clone https://github.com/ltdrdata/ComfyUI-Manager.git
docker compose restart comfyui

From there, the must-haves: Impact-Pack (face detection, regional prompting), IPAdapter_plus (style transfer from reference images), AnimateDiff-Evolved (prompt-to-animation), and UltimateSDUpscale (tile-based upscaling without VRAM limits).

Performance Tips

Enable FP16/FP8: Add --force-fp16 to CLI_ARGS for half-precision inference. Uses less VRAM with negligible quality loss.
VAE tiling: For high-resolution images, enable VAE tiling to avoid out-of-memory errors during decode.
Model caching: ComfyUI keeps the last-used model in VRAM. Switching models frequently thrashes memory. Stick to one checkpoint per session when possible.
SSD storage: Model loading time is dominated by disk read speed. NVMe SSDs load a 6 GB checkpoint in under 2 seconds; spinning disks take 20+.
Queue system: ComfyUI has a built-in queue. You can stack up multiple generations and walk away.

Securing Your Instance

ComfyUI has no built-in authentication. If you expose port 8188 to your network:

Reverse proxy with auth: Put Caddy or Nginx in front with basic auth or SSO
VPN/Tailscale: Only expose ComfyUI over your private network
Cloudflare Tunnel: Zero-trust access without port forwarding

Never expose ComfyUI directly to the internet. It executes arbitrary Python through custom nodes -- an unauthenticated instance is a remote code execution vulnerability.

Verdict

ComfyUI is the power-user's choice for local image generation. The node-based interface has a steeper learning curve than A1111's form-based UI, but it pays dividends immediately: better performance, lower VRAM usage, reproducible workflows, and the ability to build generation pipelines that simply aren't possible in other interfaces. If you have an NVIDIA GPU with 8+ GB of VRAM, you can be generating images in under ten minutes with the Docker Compose setup above. The entire Stable Diffusion ecosystem -- thousands of checkpoints, LoRAs, ControlNets, and community workflows -- is available to you, running entirely on your own hardware, with no per-image costs and no content restrictions beyond what you choose.

Self-Hosting Stable Diffusion with ComfyUI: Local AI Image Generation

Why ComfyUI Over Automatic1111

System Requirements

Minimum (Functional)

Recommended

VRAM Guidelines

Docker Deployment

Model Management

Downloading Your First Checkpoint

Model Directory Structure

Workflow Basics

Useful Workflow Patterns

Essential Custom Nodes

Performance Tips

Securing Your Instance

Verdict

More ai guides

Self-Hosting Stable Diffusion with ComfyUI: Local AI Image Generation

Why ComfyUI Over Automatic1111

System Requirements

Minimum (Functional)

Recommended

VRAM Guidelines

Docker Deployment

Model Management

Downloading Your First Checkpoint

Model Directory Structure

Workflow Basics

Useful Workflow Patterns

Essential Custom Nodes

Performance Tips

Securing Your Instance

Verdict

More ai guides

Before you go...