← All articles
teal LED panel

Dify: Self-Hosted LLM Application Platform for RAG and AI Workflows

AI & Machine Learning 2026-02-15 · 9 min read dify ai llm rag chatbot self-hosted open-source
By Selfhosted Guides Editorial TeamSelf-hosting practitioners covering open source software, home lab infrastructure, and data sovereignty.

Building applications with Large Language Models (LLMs) used to require writing code, managing prompts, handling embeddings, and orchestrating API calls. Every developer reimplemented the same patterns: retrieval-augmented generation (RAG), prompt chaining, tool calling, memory management.

Photo by Adi Goldstein on Unsplash

Dify is an open-source platform that gives you a visual interface for building LLM applications. Connect to OpenAI, Anthropic, Ollama, or any other model provider. Build chatbots, RAG pipelines, content generators, and autonomous agents without writing backend code. Deploy production APIs with authentication, rate limiting, and monitoring.

Dify AI application platform logo

Why Self-Host an LLM Application Platform

You can build LLM apps directly with OpenAI's API and custom code. But:

Dify gives you:

Dify vs. Flowise vs. Langflow

Several open-source projects tackle LLM application development:

Feature Dify Flowise Langflow
Interface Visual workflow builder Node-based flow editor Drag-and-drop components
Technology Python + PostgreSQL + Redis TypeScript + Node.js Python + SQLite
Resource usage Moderate (~500 MB-1 GB) Low (~200-400 MB) Moderate (~400-600 MB)
RAG support Built-in (upload docs → auto-embed) Manual (configure embeddings) Manual (configure chains)
Agent tools Built-in (web search, APIs, code) Plugin-based Built-in + custom
Prompt management Built-in prompt editor + versions Manual in flows Manual in components
Multi-tenancy Yes (API keys per app) No No
Authentication Built-in (API keys, OAuth) Basic Basic
Deployment Docker Compose Docker or local Docker or local
Production readiness High (API, logging, rate limits) Moderate Moderate
UI polish Professional Functional Functional
Philosophy No-code platform Developer tool Hybrid (no-code + code)

Choose Dify if you:

Choose Flowise if you:

Choose Langflow if you:

For most teams building customer-facing LLM apps, Dify is the most production-ready option.

Docker Compose Setup

Dify requires PostgreSQL for data, Redis for caching, and a vector database (Weaviate or Qdrant) for RAG:

services:
  dify-api:
    image: langgenius/dify-api:latest
    restart: unless-stopped
    environment:
      MODE: api
      SECRET_KEY: ${SECRET_KEY}
      DB_USERNAME: postgres
      DB_PASSWORD: ${DB_PASSWORD}
      DB_HOST: dify-db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: dify-redis
      REDIS_PORT: 6379
      REDIS_PASSWORD: ${REDIS_PASSWORD}
      CELERY_BROKER_URL: redis://:${REDIS_PASSWORD}@dify-redis:6379/1
      WEAVIATE_ENDPOINT: http://dify-weaviate:8080
      WEAVIATE_API_KEY: ${WEAVIATE_API_KEY}
    depends_on:
      - dify-db
      - dify-redis
      - dify-weaviate
    volumes:
      - dify_storage:/app/api/storage

  dify-worker:
    image: langgenius/dify-api:latest
    restart: unless-stopped
    environment:
      MODE: worker
      SECRET_KEY: ${SECRET_KEY}
      DB_USERNAME: postgres
      DB_PASSWORD: ${DB_PASSWORD}
      DB_HOST: dify-db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: dify-redis
      REDIS_PORT: 6379
      REDIS_PASSWORD: ${REDIS_PASSWORD}
      CELERY_BROKER_URL: redis://:${REDIS_PASSWORD}@dify-redis:6379/1
      WEAVIATE_ENDPOINT: http://dify-weaviate:8080
      WEAVIATE_API_KEY: ${WEAVIATE_API_KEY}
    depends_on:
      - dify-db
      - dify-redis
      - dify-weaviate
    volumes:
      - dify_storage:/app/api/storage

  dify-web:
    image: langgenius/dify-web:latest
    restart: unless-stopped
    environment:
      CONSOLE_API_URL: http://dify-api:5001
      APP_API_URL: http://dify-api:5001
    ports:
      - "3000:3000"

  dify-db:
    image: postgres:16-alpine
    restart: unless-stopped
    environment:
      POSTGRES_DB: dify
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - dify_db:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  dify-redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - dify_redis:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  dify-weaviate:
    image: semitechnologies/weaviate:latest
    restart: unless-stopped
    environment:
      AUTHENTICATION_APIKEY_ENABLED: "true"
      AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_API_KEY}
      PERSISTENCE_DATA_PATH: /var/lib/weaviate
      QUERY_DEFAULTS_LIMIT: 25
      DEFAULT_VECTORIZER_MODULE: none
      CLUSTER_HOSTNAME: weaviate
    volumes:
      - dify_weaviate:/var/lib/weaviate

volumes:
  dify_db:
  dify_redis:
  dify_weaviate:
  dify_storage:

Create a .env file:

SECRET_KEY=$(openssl rand -hex 32)
DB_PASSWORD=your-secure-db-password
REDIS_PASSWORD=your-redis-password
WEAVIATE_API_KEY=$(openssl rand -hex 16)

Start the stack:

docker compose up -d

Visit http://your-server:3000 and create an admin account.

First-Time Setup

After logging in:

  1. Add model providers — Settings → Model Providers → Add (OpenAI, Anthropic, Ollama, etc.)
  2. Create your first app — Studio → Create App
  3. Configure knowledge base — Upload documents for RAG
  4. Publish and test — Deploy your app and get an API endpoint

Like what you're reading? Subscribe to Self-Hosted Weekly — free weekly guides in your inbox.

Core Concepts

Apps

An app is a complete LLM application with:

Apps are deployed as APIs with authentication and rate limiting.

Knowledge Base (RAG)

Upload documents (PDFs, Word, Markdown, TXT) and Dify automatically:

  1. Chunks the text (configurable chunk size and overlap)
  2. Generates embeddings (via OpenAI, Cohere, or local models)
  3. Stores vectors in Weaviate/Qdrant
  4. Retrieves relevant chunks when users ask questions

Connect a knowledge base to any app to ground responses in your documents.

Workflows

Workflows chain multiple steps:

Build multi-step pipelines like:

  1. User asks a question
  2. Search knowledge base for context
  3. Send context + question to LLM
  4. Format response as JSON
  5. Store in database via API

Agents

Agents are apps that can:

Example agent workflow:

Tools

Dify includes built-in tools:

You can also add custom tools via OpenAPI spec or code.

Building a RAG Chatbot

Here's how to build a document Q&A chatbot:

1. Create a knowledge base

  1. Knowledge → Create Knowledge
  2. Upload documents (PDFs, Markdown, etc.)
  3. Configure chunking:
    • Chunk size: 500-1000 tokens (shorter for precise retrieval, longer for context)
    • Chunk overlap: 50-100 tokens (prevents context loss at chunk boundaries)
  4. Choose embedding model (OpenAI, Cohere, or local)
  5. Process documents (Dify chunks, embeds, and indexes automatically)

2. Create a chatbot app

  1. Studio → Create App → Chatbot
  2. Select your knowledge base
  3. Configure retrieval:
    • Top K: How many chunks to retrieve (3-5 is typical)
    • Score threshold: Minimum similarity score (0.7+ filters irrelevant results)
  4. Write a system prompt:
You are a helpful assistant that answers questions based on the provided context.

Context:
{{#context#}}

User question:
{{#query#}}

Answer the question using only the information in the context. If the context doesn't contain the answer, say "I don't have enough information to answer that."
  1. Test in the preview panel
  2. Publish the app

3. Deploy as API

  1. Publish → API Access
  2. Generate an API key
  3. Call the API:
curl -X POST https://dify.yourdomain.com/v1/chat-messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the refund policy?",
    "user": "user-123"
  }'

The API returns the LLM response plus citation metadata (which chunks were retrieved).

Building an Agent

Agents can use tools to complete tasks autonomously:

1. Create an agent app

  1. Studio → Create App → Agent
  2. Select a model (GPT-4 or Claude work best for agents)
  3. Add tools:
    • Web search (for real-time information)
    • HTTP API (to call external services)
    • Code interpreter (to run calculations)

2. Configure tools

Example: Add a weather API tool:

  1. Add Tool → HTTP Request

  2. Configure:

    • URL: https://api.weather.com/v1/current?city={{city}}
    • Method: GET
    • Headers: Authorization: Bearer YOUR_API_KEY
    • Parameters: city (extracted from user input)
  3. Write a tool description:

Use this tool to get current weather information for a city.
Input: city name (e.g., "Seattle")
Output: JSON with temperature, conditions, humidity

3. Write agent instructions

You are a helpful assistant that can:
- Answer questions using web search
- Get weather information for any city
- Perform calculations

When the user asks about weather, use the weather API tool.
When asked to search for information, use web search.
When asked to calculate, use the code interpreter.

4. Test and deploy

Test the agent in preview mode:

Deploy as an API for integration with your app, Discord bot, Slack, etc.

Model Providers

Dify supports multiple LLM providers:

Configure providers

  1. Settings → Model Providers
  2. Add providers and API keys:

Switch models per app

Each app can use a different model:

Using Ollama for local models

Run Ollama on the same server or network:

services:
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"

volumes:
  ollama_data:

Pull models:

docker exec ollama ollama pull llama3.2

In Dify, add Ollama as a provider:

Now you can use local models without API costs.

Prompt Management

Dify includes a prompt editor with:

Example prompt for a customer support chatbot:

You are a support agent for {{company_name}}.

Current date: {{date}}
User tier: {{user_tier}}

Knowledge base context:
{{#context#}}

User message:
{{#query#}}

Respond professionally and helpfully. If the knowledge base doesn't contain the answer, offer to escalate to a human agent.

Variables are filled automatically at runtime.

API and Integration

Every Dify app exposes a REST API:

Chat completion

curl -X POST https://dify.yourdomain.com/v1/chat-messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I reset my password?",
    "user": "user-123",
    "conversation_id": "abc123"
  }'

Response includes:

Streaming responses

For real-time chatbot UI, use streaming:

curl -X POST https://dify.yourdomain.com/v1/chat-messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain quantum computing",
    "user": "user-123",
    "stream": true
  }'

Returns Server-Sent Events (SSE) with incremental chunks.

Embeddings

Use Dify's embedding API for similarity search:

curl -X POST https://dify.yourdomain.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "What is machine learning?"
  }'

Resource Requirements

| Component | RAM | CPU | Storage | |---|---|---| | Dify API | 200-400 MB | Low (0.5 cores) | 100 MB | | Dify Worker | 200-400 MB | Low (0.5 cores) | — | | Dify Web | 50-100 MB | Minimal | 50 MB | | PostgreSQL | 100-200 MB | Low | 200-500 MB | | Redis | 50-100 MB | Minimal | 50 MB | | Weaviate | 200-500 MB | Low | 500 MB-5 GB (depends on docs) | | Total | ~800 MB-1.7 GB | ~1-2 cores | ~1-6 GB |

Storage for Weaviate depends on document volume:

Production Considerations

Rate limiting

Configure per-app rate limits:

  1. App Settings → API Access → Rate Limiting
  2. Set requests per minute/hour
  3. Enforce at API gateway or in Dify

Monitoring

Dify logs all API calls:

Export logs to Prometheus, Grafana, or Loki for monitoring.

Cost tracking

Dify tracks token usage per app:

Backup

  1. PostgreSQL: pg_dump the database
  2. Weaviate: Back up the volume (dify_weaviate)
  3. Storage: Back up uploaded documents (dify_storage)

Honest Limitations

For simple use cases (basic chatbot, single-prompt API), Dify is overkill. For production LLM apps with RAG, tools, and team collaboration, it's a time-saver.

The Bottom Line

Dify is the most production-ready open-source platform for building LLM applications. It has the polish of a commercial product (clean UI, good documentation, active development) while being fully self-hosted and free.

If you're building customer-facing chatbots, internal knowledge bases, or autonomous agents, Dify eliminates weeks of boilerplate code. If you're experimenting with LLM workflows and want maximum flexibility, Flowise or Langflow might fit better.

For teams that want to move fast, iterate with non-technical stakeholders, and deploy production APIs without managing infrastructure, Dify is the best option in the self-hosted space.

Resources

Get free weekly tips in your inbox. Subscribe to Self-Hosted Weekly