← All articles
a computer monitor sitting on top of a table

Self-Hosting Stable Diffusion with ComfyUI: Local AI Image Generation

AI 2026-02-15 · 6 min read ai stable-diffusion comfyui docker gpu image-generation
By Selfhosted Guides Editorial TeamSelf-hosting practitioners covering open source software, home lab infrastructure, and data sovereignty.

Cloud image generation services charge per image, impose content filters you can't control, and send every prompt to someone else's server. Self-hosting Stable Diffusion eliminates all three problems. You get unlimited generations, full control over what you create, and complete privacy -- all running on your own GPU.

Photo by Lightsaber Collection on Unsplash

ComfyUI is the best way to run Stable Diffusion locally. It's a node-based workflow editor that exposes the entire diffusion pipeline as a visual graph. Instead of hiding complexity behind a single "Generate" button, ComfyUI lets you wire together CLIP text encoding, KSampler nodes, VAE decoding, ControlNet conditioning, and LoRA loading exactly how you want. That sounds intimidating, but the default workflow works out of the box -- and the node graph means you can understand and modify every step of the generation process.

Stability AI stable diffusion logo

Why ComfyUI Over Automatic1111

AUTOMATIC1111's Web UI (A1111) was the default Stable Diffusion interface for years. It's still popular, but ComfyUI has pulled ahead for several reasons:

The tradeoff: A1111 has a simpler interface for basic text-to-image. If you just want to type a prompt and click Generate, A1111 is more approachable. But the moment you want to do anything beyond basic generation, ComfyUI's node system is dramatically more powerful.

System Requirements

Image generation is GPU-bound. CPU inference exists but is impractically slow -- expect 10+ minutes per image instead of seconds.

Minimum (Functional)

Recommended

VRAM Guidelines

Model Resolution VRAM Needed Time/Image (RTX 3060)
SD 1.5 512x512 4 GB ~3 sec
SDXL 1024x1024 6-8 GB ~8 sec
Flux Dev 1024x1024 10-12 GB ~15 sec
SD 1.5 + ControlNet 512x512 6 GB ~5 sec
SDXL + LoRA + ControlNet 1024x1024 10 GB ~12 sec

AMD GPUs work via ROCm but expect rougher edges. Intel Arc has experimental support. Apple Silicon runs through MPS -- functional but 2-3x slower than equivalent NVIDIA hardware.

Docker Deployment

The cleanest way to run ComfyUI is with Docker and NVIDIA Container Toolkit.

First, install the NVIDIA Container Toolkit and verify your GPU is visible: docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi

# docker-compose.yml
services:
  comfyui:
    image: ghcr.io/ai-dock/comfyui:latest
    container_name: comfyui
    ports:
      - "8188:8188"
    volumes:
      - ./models:/opt/ComfyUI/models
      - ./output:/opt/ComfyUI/output
      - ./custom_nodes:/opt/ComfyUI/custom_nodes
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    environment:
      - CLI_ARGS=--listen 0.0.0.0
    restart: unless-stopped

volumes:
  comfyui_data:
mkdir -p models/checkpoints models/loras models/controlnet models/vae output custom_nodes
docker compose up -d

Open http://your-server:8188 and you'll see the ComfyUI node editor. It ships with a default text-to-image workflow -- but you'll need to download a model checkpoint first.

Like what you're reading? Subscribe to Self-Hosted Weekly — free weekly guides in your inbox.

Model Management

Models are the core of image generation. You need at least one checkpoint to get started.

Downloading Your First Checkpoint

# SD 1.5 -- small, fast, huge ecosystem of LoRAs and embeddings
wget -P models/checkpoints/ \
  "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors"

# SDXL -- higher quality, higher VRAM usage
wget -P models/checkpoints/ \
  "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors"

Community fine-tunes on CivitAI and Hugging Face are where the real variety lives. Models like Realistic Vision (photorealism), DreamShaper (artistic), and Juggernaut XL (general purpose SDXL) are popular starting points.

Model Directory Structure

ComfyUI expects models in specific subdirectories:

models/
├── checkpoints/    # Main model files (.safetensors)
├── loras/          # LoRA fine-tunes (style/subject adapters)
├── controlnet/     # ControlNet models (pose, depth, canny)
├── vae/            # VAE decoders (affects color/detail)
├── embeddings/     # Textual inversions
├── upscale_models/ # Upscaler models (RealESRGAN, etc.)
└── clip/           # CLIP text encoder models

Drop files into the right directory and refresh the ComfyUI browser page. No restart needed.

Workflow Basics

ComfyUI's node graph can look overwhelming at first. Here's what the default text-to-image workflow does:

  1. Load Checkpoint -- loads the model (UNet, CLIP, VAE) into memory
  2. CLIP Text Encode (Positive) -- converts your prompt into embeddings the model understands
  3. CLIP Text Encode (Negative) -- encodes things you don't want in the image
  4. KSampler -- the denoising loop that actually generates the image from noise
  5. VAE Decode -- converts the latent output into a visible image
  6. Save Image -- writes the result to disk

Each node has inputs and outputs you can rewire. Want to add a LoRA? Insert a "Load LoRA" node between the checkpoint and the CLIP encoder. Want ControlNet? Add a "Load ControlNet Model" and "Apply ControlNet" node before the KSampler. The graph makes the data flow explicit.

Useful Workflow Patterns

Essential Custom Nodes

ComfyUI's plugin ecosystem lives in the custom_nodes directory. Install ComfyUI Manager first -- it adds a UI button for browsing and installing everything else:

cd custom_nodes && git clone https://github.com/ltdrdata/ComfyUI-Manager.git
docker compose restart comfyui

From there, the must-haves: Impact-Pack (face detection, regional prompting), IPAdapter_plus (style transfer from reference images), AnimateDiff-Evolved (prompt-to-animation), and UltimateSDUpscale (tile-based upscaling without VRAM limits).

Performance Tips

Securing Your Instance

ComfyUI has no built-in authentication. If you expose port 8188 to your network:

Never expose ComfyUI directly to the internet. It executes arbitrary Python through custom nodes -- an unauthenticated instance is a remote code execution vulnerability.

Verdict

ComfyUI is the power-user's choice for local image generation. The node-based interface has a steeper learning curve than A1111's form-based UI, but it pays dividends immediately: better performance, lower VRAM usage, reproducible workflows, and the ability to build generation pipelines that simply aren't possible in other interfaces. If you have an NVIDIA GPU with 8+ GB of VRAM, you can be generating images in under ten minutes with the Docker Compose setup above. The entire Stable Diffusion ecosystem -- thousands of checkpoints, LoRAs, ControlNets, and community workflows -- is available to you, running entirely on your own hardware, with no per-image costs and no content restrictions beyond what you choose.

Get free weekly tips in your inbox. Subscribe to Self-Hosted Weekly