Dify: Self-Hosted LLM Application Platform for RAG and AI Workflows

AI & Machine Learning 2026-02-15 · 9 min read dify ai llm rag chatbot self-hosted open-source
By Selfhosted Guides Editorial Team — Self-hosting practitioners covering open source software, home lab infrastructure, and data sovereignty.

Building applications with Large Language Models (LLMs) used to require writing code, managing prompts, handling embeddings, and orchestrating API calls. Every developer reimplemented the same patterns: retrieval-augmented generation (RAG), prompt chaining, tool calling, memory management.

Photo by Adi Goldstein on Unsplash

Dify is an open-source platform that gives you a visual interface for building LLM applications. Connect to OpenAI, Anthropic, Ollama, or any other model provider. Build chatbots, RAG pipelines, content generators, and autonomous agents without writing backend code. Deploy production APIs with authentication, rate limiting, and monitoring.

Why Self-Host an LLM Application Platform

You can build LLM apps directly with OpenAI's API and custom code. But:

Every project reimplements the same boilerplate: Prompt management, context handling, embedding generation, vector search
Iteration is slow: Code → deploy → test cycles take minutes
Non-technical users can't contribute: Prompt engineering and workflow design require developer involvement
You pay twice: SaaS platforms like Relevance AI charge markup on top of model API costs

Dify gives you:

Visual workflow builder — design LLM pipelines without code
Built-in RAG — upload documents, automatic chunking, embedding, and vector retrieval
Multi-model support — switch between OpenAI, Claude, Llama, Gemini without code changes
Production-ready APIs — deploy apps with authentication, logging, and rate limiting
Self-hosted and free — run on your infrastructure, pay only for model API usage

Dify vs. Flowise vs. Langflow

Several open-source projects tackle LLM application development:

Feature	Dify	Flowise	Langflow
Interface	Visual workflow builder	Node-based flow editor	Drag-and-drop components
Technology	Python + PostgreSQL + Redis	TypeScript + Node.js	Python + SQLite
Resource usage	Moderate (~500 MB-1 GB)	Low (~200-400 MB)	Moderate (~400-600 MB)
RAG support	Built-in (upload docs → auto-embed)	Manual (configure embeddings)	Manual (configure chains)
Agent tools	Built-in (web search, APIs, code)	Plugin-based	Built-in + custom
Prompt management	Built-in prompt editor + versions	Manual in flows	Manual in components
Multi-tenancy	Yes (API keys per app)	No	No
Authentication	Built-in (API keys, OAuth)	Basic	Basic
Deployment	Docker Compose	Docker or local	Docker or local
Production readiness	High (API, logging, rate limits)	Moderate	Moderate
UI polish	Professional	Functional	Functional
Philosophy	No-code platform	Developer tool	Hybrid (no-code + code)

Choose Dify if you:

Want to build production-ready LLM apps with minimal code
Need RAG with minimal setup (upload docs and go)
Want non-technical users to iterate on prompts and workflows
Need multi-tenancy (deploy apps for multiple customers)
Value a polished UI and enterprise-ready features

Choose Flowise if you:

Prefer Node.js over Python
Want maximum flexibility in flow design
Need lightweight resource usage
Are comfortable with manual configuration

Choose Langflow if you:

Want a middle ground between no-code and code
Need to integrate custom Python code into workflows
Prefer drag-and-drop over structured forms
Want to iterate quickly on experimental pipelines

For most teams building customer-facing LLM apps, Dify is the most production-ready option.

Docker Compose Setup

Dify requires PostgreSQL for data, Redis for caching, and a vector database (Weaviate or Qdrant) for RAG:

services:
  dify-api:
    image: langgenius/dify-api:latest
    restart: unless-stopped
    environment:
      MODE: api
      SECRET_KEY: ${SECRET_KEY}
      DB_USERNAME: postgres
      DB_PASSWORD: ${DB_PASSWORD}
      DB_HOST: dify-db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: dify-redis
      REDIS_PORT: 6379
      REDIS_PASSWORD: ${REDIS_PASSWORD}
      CELERY_BROKER_URL: redis://:${REDIS_PASSWORD}@dify-redis:6379/1
      WEAVIATE_ENDPOINT: http://dify-weaviate:8080
      WEAVIATE_API_KEY: ${WEAVIATE_API_KEY}
    depends_on:
      - dify-db
      - dify-redis
      - dify-weaviate
    volumes:
      - dify_storage:/app/api/storage

  dify-worker:
    image: langgenius/dify-api:latest
    restart: unless-stopped
    environment:
      MODE: worker
      SECRET_KEY: ${SECRET_KEY}
      DB_USERNAME: postgres
      DB_PASSWORD: ${DB_PASSWORD}
      DB_HOST: dify-db
      DB_PORT: 5432
      DB_DATABASE: dify
      REDIS_HOST: dify-redis
      REDIS_PORT: 6379
      REDIS_PASSWORD: ${REDIS_PASSWORD}
      CELERY_BROKER_URL: redis://:${REDIS_PASSWORD}@dify-redis:6379/1
      WEAVIATE_ENDPOINT: http://dify-weaviate:8080
      WEAVIATE_API_KEY: ${WEAVIATE_API_KEY}
    depends_on:
      - dify-db
      - dify-redis
      - dify-weaviate
    volumes:
      - dify_storage:/app/api/storage

  dify-web:
    image: langgenius/dify-web:latest
    restart: unless-stopped
    environment:
      CONSOLE_API_URL: http://dify-api:5001
      APP_API_URL: http://dify-api:5001
    ports:
      - "3000:3000"

  dify-db:
    image: postgres:16-alpine
    restart: unless-stopped
    environment:
      POSTGRES_DB: dify
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ${DB_PASSWORD}
    volumes:
      - dify_db:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  dify-redis:
    image: redis:7-alpine
    restart: unless-stopped
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - dify_redis:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  dify-weaviate:
    image: semitechnologies/weaviate:latest
    restart: unless-stopped
    environment:
      AUTHENTICATION_APIKEY_ENABLED: "true"
      AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_API_KEY}
      PERSISTENCE_DATA_PATH: /var/lib/weaviate
      QUERY_DEFAULTS_LIMIT: 25
      DEFAULT_VECTORIZER_MODULE: none
      CLUSTER_HOSTNAME: weaviate
    volumes:
      - dify_weaviate:/var/lib/weaviate

volumes:
  dify_db:
  dify_redis:
  dify_weaviate:
  dify_storage:

Create a .env file:

SECRET_KEY=$(openssl rand -hex 32)
DB_PASSWORD=your-secure-db-password
REDIS_PASSWORD=your-redis-password
WEAVIATE_API_KEY=$(openssl rand -hex 16)

Start the stack:

docker compose up -d

Visit http://your-server:3000 and create an admin account.

First-Time Setup

After logging in:

Add model providers — Settings → Model Providers → Add (OpenAI, Anthropic, Ollama, etc.)
Create your first app — Studio → Create App
Configure knowledge base — Upload documents for RAG
Publish and test — Deploy your app and get an API endpoint

Want more ai & machine learning guides? Get guides like this in your inbox — Self-Hosted Weekly delivers one free deep-dive every week.

Core Concepts

Apps

An app is a complete LLM application with:

Type: Chatbot, text generator, or agent
Model: Which LLM to use (GPT-4, Claude, Llama, etc.)
Prompt: System instructions and user message template
Knowledge: Optional RAG documents
Tools: External APIs, code execution, web search
API: Production endpoint for integrations

Apps are deployed as APIs with authentication and rate limiting.

Knowledge Base (RAG)

Upload documents (PDFs, Word, Markdown, TXT) and Dify automatically:

Chunks the text (configurable chunk size and overlap)
Generates embeddings (via OpenAI, Cohere, or local models)
Stores vectors in Weaviate/Qdrant
Retrieves relevant chunks when users ask questions

Connect a knowledge base to any app to ground responses in your documents.

Workflows

Workflows chain multiple steps:

LLM calls with different prompts and models
Conditional logic (if/else branching)
Tool calls (APIs, database queries, code execution)
Data transformation (extract, filter, format)
Loops (iterate over lists)

Build multi-step pipelines like:

User asks a question
Search knowledge base for context
Send context + question to LLM
Format response as JSON
Store in database via API

Agents

Agents are apps that can:

Use tools autonomously (web search, calculator, API calls)
Break tasks into sub-tasks
Decide which tools to use based on context

Example agent workflow:

User: "What's the weather in Seattle and how should I dress?"
Agent: Calls weather API → analyzes temperature → suggests clothing

Tools

Dify includes built-in tools:

Web search (Google, DuckDuckGo)
HTTP requests (call external APIs)
Code execution (run Python/JavaScript in sandbox)
Database queries (PostgreSQL, MySQL)
File operations (read/write files)

You can also add custom tools via OpenAPI spec or code.

Building a RAG Chatbot

Here's how to build a document Q&A chatbot:

1. Create a knowledge base

Knowledge → Create Knowledge
Upload documents (PDFs, Markdown, etc.)
Configure chunking:
- Chunk size: 500-1000 tokens (shorter for precise retrieval, longer for context)
- Chunk overlap: 50-100 tokens (prevents context loss at chunk boundaries)
Choose embedding model (OpenAI, Cohere, or local)
Process documents (Dify chunks, embeds, and indexes automatically)

2. Create a chatbot app

Studio → Create App → Chatbot
Select your knowledge base
Configure retrieval:
- Top K: How many chunks to retrieve (3-5 is typical)
- Score threshold: Minimum similarity score (0.7+ filters irrelevant results)
Write a system prompt:

You are a helpful assistant that answers questions based on the provided context.

Context:
{{#context#}}

User question:
{{#query#}}

Answer the question using only the information in the context. If the context doesn't contain the answer, say "I don't have enough information to answer that."

Test in the preview panel
Publish the app

3. Deploy as API

Publish → API Access
Generate an API key
Call the API:

curl -X POST https://dify.yourdomain.com/v1/chat-messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the refund policy?",
    "user": "user-123"
  }'

The API returns the LLM response plus citation metadata (which chunks were retrieved).

Building an Agent

Agents can use tools to complete tasks autonomously:

1. Create an agent app

Studio → Create App → Agent
Select a model (GPT-4 or Claude work best for agents)
Add tools:
- Web search (for real-time information)
- HTTP API (to call external services)
- Code interpreter (to run calculations)

2. Configure tools

Example: Add a weather API tool:

Add Tool → HTTP Request
Configure:
- URL: https://api.weather.com/v1/current?city={{city}}
- Method: GET
- Headers: Authorization: Bearer YOUR_API_KEY
- Parameters: city (extracted from user input)
Write a tool description:

Use this tool to get current weather information for a city.
Input: city name (e.g., "Seattle")
Output: JSON with temperature, conditions, humidity

3. Write agent instructions

You are a helpful assistant that can:
- Answer questions using web search
- Get weather information for any city
- Perform calculations

When the user asks about weather, use the weather API tool.
When asked to search for information, use web search.
When asked to calculate, use the code interpreter.

4. Test and deploy

Test the agent in preview mode:

"What's the weather in Seattle?"
Agent calls weather API → parses JSON → responds with current conditions

Deploy as an API for integration with your app, Discord bot, Slack, etc.

Model Providers

Dify supports multiple LLM providers:

Configure providers

Settings → Model Providers
Add providers and API keys:

OpenAI: GPT-4, GPT-3.5-turbo
Anthropic: Claude 3.5 Sonnet, Claude Opus
Ollama: Self-hosted Llama, Mistral, etc.
Google: Gemini Pro
Cohere: Command models
Local models: Llama.cpp, vLLM

Switch models per app

Each app can use a different model:

Customer support chatbot: GPT-3.5-turbo (fast, cheap)
Technical documentation Q&A: Claude Opus (deep reasoning)
Internal tools: Self-hosted Llama via Ollama (privacy, no API costs)

Using Ollama for local models

Run Ollama on the same server or network:

services:
  ollama:
    image: ollama/ollama:latest
    restart: unless-stopped
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"

volumes:
  ollama_data:

Pull models:

docker exec ollama ollama pull llama3.2

In Dify, add Ollama as a provider:

Endpoint: http://ollama:11434
Model: llama3.2

Now you can use local models without API costs.

Prompt Management

Dify includes a prompt editor with:

Variables: Inject user input, context, timestamps
Templates: Reusable prompt structures
Versioning: Track changes and roll back

Example prompt for a customer support chatbot:

You are a support agent for {{company_name}}.

Current date: {{date}}
User tier: {{user_tier}}

Knowledge base context:
{{#context#}}

User message:
{{#query#}}

Respond professionally and helpfully. If the knowledge base doesn't contain the answer, offer to escalate to a human agent.

Variables are filled automatically at runtime.

API and Integration

Every Dify app exposes a REST API:

Chat completion

curl -X POST https://dify.yourdomain.com/v1/chat-messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I reset my password?",
    "user": "user-123",
    "conversation_id": "abc123"
  }'

Response includes:

answer: The LLM response
metadata: Retrieved documents, tool calls, etc.
conversation_id: For multi-turn conversations

Streaming responses

For real-time chatbot UI, use streaming:

curl -X POST https://dify.yourdomain.com/v1/chat-messages \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Explain quantum computing",
    "user": "user-123",
    "stream": true
  }'

Returns Server-Sent Events (SSE) with incremental chunks.

Embeddings

Use Dify's embedding API for similarity search:

curl -X POST https://dify.yourdomain.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "What is machine learning?"
  }'

Resource Requirements

| Component | RAM | CPU | Storage | |---|---|---| | Dify API | 200-400 MB | Low (~~0.5 cores) | 100 MB | | Dify Worker | 200-400 MB | Low (~~0.5 cores) | — | | Dify Web | 50-100 MB | Minimal | 50 MB | | PostgreSQL | 100-200 MB | Low | 200-500 MB | | Redis | 50-100 MB | Minimal | 50 MB | | Weaviate | 200-500 MB | Low | 500 MB-5 GB (depends on docs) | | Total | ~800 MB-1.7 GB | ~1-2 cores | ~1-6 GB |

Storage for Weaviate depends on document volume:

1,000 documents (~500 KB each): ~500 MB
10,000 documents: ~2-3 GB
100,000 documents: ~10-20 GB

Production Considerations

Rate limiting

Configure per-app rate limits:

App Settings → API Access → Rate Limiting
Set requests per minute/hour
Enforce at API gateway or in Dify

Monitoring

Dify logs all API calls:

Logs → API Logs — view requests, latency, costs
Logs → Conversations — replay chatbot interactions
Logs → Annotations — flag and improve bad responses

Export logs to Prometheus, Grafana, or Loki for monitoring.

Cost tracking

Dify tracks token usage per app:

Dashboard → Usage — view tokens, costs per provider
Set budgets and alerts

Backup

PostgreSQL: pg_dump the database
Weaviate: Back up the volume (dify_weaviate)
Storage: Back up uploaded documents (dify_storage)

Honest Limitations

Resource usage is higher than minimal tools: Dify runs multiple containers and needs 1-2 GB RAM
Vector database required: You must run Weaviate or Qdrant (no lightweight SQLite option)
Less flexible than code: Visual workflows are easier but can't express every LLM pattern
Agent reliability varies: Agents work well with GPT-4/Claude but struggle with weaker models
Learning curve: Dify has many features; onboarding takes time

For simple use cases (basic chatbot, single-prompt API), Dify is overkill. For production LLM apps with RAG, tools, and team collaboration, it's a time-saver.

The Bottom Line

Dify is the most production-ready open-source platform for building LLM applications. It has the polish of a commercial product (clean UI, good documentation, active development) while being fully self-hosted and free.

If you're building customer-facing chatbots, internal knowledge bases, or autonomous agents, Dify eliminates weeks of boilerplate code. If you're experimenting with LLM workflows and want maximum flexibility, Flowise or Langflow might fit better.

For teams that want to move fast, iterate with non-technical stakeholders, and deploy production APIs without managing infrastructure, Dify is the best option in the self-hosted space.

Resources

Dify documentation
Dify GitHub
Community Discord
Dify Cloud — try hosted version (or deploy self-hosted for privacy)