Dify: Self-Hosted LLM Application Platform for RAG and AI Workflows
Building applications with Large Language Models (LLMs) used to require writing code, managing prompts, handling embeddings, and orchestrating API calls. Every developer reimplemented the same patterns: retrieval-augmented generation (RAG), prompt chaining, tool calling, memory management.
Photo by Adi Goldstein on Unsplash
Dify is an open-source platform that gives you a visual interface for building LLM applications. Connect to OpenAI, Anthropic, Ollama, or any other model provider. Build chatbots, RAG pipelines, content generators, and autonomous agents without writing backend code. Deploy production APIs with authentication, rate limiting, and monitoring.
Why Self-Host an LLM Application Platform
You can build LLM apps directly with OpenAI's API and custom code. But:
- Every project reimplements the same boilerplate: Prompt management, context handling, embedding generation, vector search
- Iteration is slow: Code → deploy → test cycles take minutes
- Non-technical users can't contribute: Prompt engineering and workflow design require developer involvement
- You pay twice: SaaS platforms like Relevance AI charge markup on top of model API costs
Dify gives you:
- Visual workflow builder — design LLM pipelines without code
- Built-in RAG — upload documents, automatic chunking, embedding, and vector retrieval
- Multi-model support — switch between OpenAI, Claude, Llama, Gemini without code changes
- Production-ready APIs — deploy apps with authentication, logging, and rate limiting
- Self-hosted and free — run on your infrastructure, pay only for model API usage
Dify vs. Flowise vs. Langflow
Several open-source projects tackle LLM application development:
| Feature | Dify | Flowise | Langflow |
|---|---|---|---|
| Interface | Visual workflow builder | Node-based flow editor | Drag-and-drop components |
| Technology | Python + PostgreSQL + Redis | TypeScript + Node.js | Python + SQLite |
| Resource usage | Moderate (~500 MB-1 GB) | Low (~200-400 MB) | Moderate (~400-600 MB) |
| RAG support | Built-in (upload docs → auto-embed) | Manual (configure embeddings) | Manual (configure chains) |
| Agent tools | Built-in (web search, APIs, code) | Plugin-based | Built-in + custom |
| Prompt management | Built-in prompt editor + versions | Manual in flows | Manual in components |
| Multi-tenancy | Yes (API keys per app) | No | No |
| Authentication | Built-in (API keys, OAuth) | Basic | Basic |
| Deployment | Docker Compose | Docker or local | Docker or local |
| Production readiness | High (API, logging, rate limits) | Moderate | Moderate |
| UI polish | Professional | Functional | Functional |
| Philosophy | No-code platform | Developer tool | Hybrid (no-code + code) |
Choose Dify if you:
- Want to build production-ready LLM apps with minimal code
- Need RAG with minimal setup (upload docs and go)
- Want non-technical users to iterate on prompts and workflows
- Need multi-tenancy (deploy apps for multiple customers)
- Value a polished UI and enterprise-ready features
Choose Flowise if you:
- Prefer Node.js over Python
- Want maximum flexibility in flow design
- Need lightweight resource usage
- Are comfortable with manual configuration
Choose Langflow if you:
- Want a middle ground between no-code and code
- Need to integrate custom Python code into workflows
- Prefer drag-and-drop over structured forms
- Want to iterate quickly on experimental pipelines
For most teams building customer-facing LLM apps, Dify is the most production-ready option.
Docker Compose Setup
Dify requires PostgreSQL for data, Redis for caching, and a vector database (Weaviate or Qdrant) for RAG:
services:
dify-api:
image: langgenius/dify-api:latest
restart: unless-stopped
environment:
MODE: api
SECRET_KEY: ${SECRET_KEY}
DB_USERNAME: postgres
DB_PASSWORD: ${DB_PASSWORD}
DB_HOST: dify-db
DB_PORT: 5432
DB_DATABASE: dify
REDIS_HOST: dify-redis
REDIS_PORT: 6379
REDIS_PASSWORD: ${REDIS_PASSWORD}
CELERY_BROKER_URL: redis://:${REDIS_PASSWORD}@dify-redis:6379/1
WEAVIATE_ENDPOINT: http://dify-weaviate:8080
WEAVIATE_API_KEY: ${WEAVIATE_API_KEY}
depends_on:
- dify-db
- dify-redis
- dify-weaviate
volumes:
- dify_storage:/app/api/storage
dify-worker:
image: langgenius/dify-api:latest
restart: unless-stopped
environment:
MODE: worker
SECRET_KEY: ${SECRET_KEY}
DB_USERNAME: postgres
DB_PASSWORD: ${DB_PASSWORD}
DB_HOST: dify-db
DB_PORT: 5432
DB_DATABASE: dify
REDIS_HOST: dify-redis
REDIS_PORT: 6379
REDIS_PASSWORD: ${REDIS_PASSWORD}
CELERY_BROKER_URL: redis://:${REDIS_PASSWORD}@dify-redis:6379/1
WEAVIATE_ENDPOINT: http://dify-weaviate:8080
WEAVIATE_API_KEY: ${WEAVIATE_API_KEY}
depends_on:
- dify-db
- dify-redis
- dify-weaviate
volumes:
- dify_storage:/app/api/storage
dify-web:
image: langgenius/dify-web:latest
restart: unless-stopped
environment:
CONSOLE_API_URL: http://dify-api:5001
APP_API_URL: http://dify-api:5001
ports:
- "3000:3000"
dify-db:
image: postgres:16-alpine
restart: unless-stopped
environment:
POSTGRES_DB: dify
POSTGRES_USER: postgres
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- dify_db:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
dify-redis:
image: redis:7-alpine
restart: unless-stopped
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- dify_redis:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
dify-weaviate:
image: semitechnologies/weaviate:latest
restart: unless-stopped
environment:
AUTHENTICATION_APIKEY_ENABLED: "true"
AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_API_KEY}
PERSISTENCE_DATA_PATH: /var/lib/weaviate
QUERY_DEFAULTS_LIMIT: 25
DEFAULT_VECTORIZER_MODULE: none
CLUSTER_HOSTNAME: weaviate
volumes:
- dify_weaviate:/var/lib/weaviate
volumes:
dify_db:
dify_redis:
dify_weaviate:
dify_storage:
Create a .env file:
SECRET_KEY=$(openssl rand -hex 32)
DB_PASSWORD=your-secure-db-password
REDIS_PASSWORD=your-redis-password
WEAVIATE_API_KEY=$(openssl rand -hex 16)
Start the stack:
docker compose up -d
Visit http://your-server:3000 and create an admin account.
First-Time Setup
After logging in:
- Add model providers — Settings → Model Providers → Add (OpenAI, Anthropic, Ollama, etc.)
- Create your first app — Studio → Create App
- Configure knowledge base — Upload documents for RAG
- Publish and test — Deploy your app and get an API endpoint
Like what you're reading? Subscribe to Self-Hosted Weekly — free weekly guides in your inbox.
Core Concepts
Apps
An app is a complete LLM application with:
- Type: Chatbot, text generator, or agent
- Model: Which LLM to use (GPT-4, Claude, Llama, etc.)
- Prompt: System instructions and user message template
- Knowledge: Optional RAG documents
- Tools: External APIs, code execution, web search
- API: Production endpoint for integrations
Apps are deployed as APIs with authentication and rate limiting.
Knowledge Base (RAG)
Upload documents (PDFs, Word, Markdown, TXT) and Dify automatically:
- Chunks the text (configurable chunk size and overlap)
- Generates embeddings (via OpenAI, Cohere, or local models)
- Stores vectors in Weaviate/Qdrant
- Retrieves relevant chunks when users ask questions
Connect a knowledge base to any app to ground responses in your documents.
Workflows
Workflows chain multiple steps:
- LLM calls with different prompts and models
- Conditional logic (if/else branching)
- Tool calls (APIs, database queries, code execution)
- Data transformation (extract, filter, format)
- Loops (iterate over lists)
Build multi-step pipelines like:
- User asks a question
- Search knowledge base for context
- Send context + question to LLM
- Format response as JSON
- Store in database via API
Agents
Agents are apps that can:
- Use tools autonomously (web search, calculator, API calls)
- Break tasks into sub-tasks
- Decide which tools to use based on context
Example agent workflow:
- User: "What's the weather in Seattle and how should I dress?"
- Agent: Calls weather API → analyzes temperature → suggests clothing
Tools
Dify includes built-in tools:
- Web search (Google, DuckDuckGo)
- HTTP requests (call external APIs)
- Code execution (run Python/JavaScript in sandbox)
- Database queries (PostgreSQL, MySQL)
- File operations (read/write files)
You can also add custom tools via OpenAPI spec or code.
Building a RAG Chatbot
Here's how to build a document Q&A chatbot:
1. Create a knowledge base
- Knowledge → Create Knowledge
- Upload documents (PDFs, Markdown, etc.)
- Configure chunking:
- Chunk size: 500-1000 tokens (shorter for precise retrieval, longer for context)
- Chunk overlap: 50-100 tokens (prevents context loss at chunk boundaries)
- Choose embedding model (OpenAI, Cohere, or local)
- Process documents (Dify chunks, embeds, and indexes automatically)
2. Create a chatbot app
- Studio → Create App → Chatbot
- Select your knowledge base
- Configure retrieval:
- Top K: How many chunks to retrieve (3-5 is typical)
- Score threshold: Minimum similarity score (0.7+ filters irrelevant results)
- Write a system prompt:
You are a helpful assistant that answers questions based on the provided context.
Context:
{{#context#}}
User question:
{{#query#}}
Answer the question using only the information in the context. If the context doesn't contain the answer, say "I don't have enough information to answer that."
- Test in the preview panel
- Publish the app
3. Deploy as API
- Publish → API Access
- Generate an API key
- Call the API:
curl -X POST https://dify.yourdomain.com/v1/chat-messages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the refund policy?",
"user": "user-123"
}'
The API returns the LLM response plus citation metadata (which chunks were retrieved).
Building an Agent
Agents can use tools to complete tasks autonomously:
1. Create an agent app
- Studio → Create App → Agent
- Select a model (GPT-4 or Claude work best for agents)
- Add tools:
- Web search (for real-time information)
- HTTP API (to call external services)
- Code interpreter (to run calculations)
2. Configure tools
Example: Add a weather API tool:
Add Tool → HTTP Request
Configure:
- URL:
https://api.weather.com/v1/current?city={{city}} - Method: GET
- Headers:
Authorization: Bearer YOUR_API_KEY - Parameters:
city(extracted from user input)
- URL:
Write a tool description:
Use this tool to get current weather information for a city.
Input: city name (e.g., "Seattle")
Output: JSON with temperature, conditions, humidity
3. Write agent instructions
You are a helpful assistant that can:
- Answer questions using web search
- Get weather information for any city
- Perform calculations
When the user asks about weather, use the weather API tool.
When asked to search for information, use web search.
When asked to calculate, use the code interpreter.
4. Test and deploy
Test the agent in preview mode:
- "What's the weather in Seattle?"
- Agent calls weather API → parses JSON → responds with current conditions
Deploy as an API for integration with your app, Discord bot, Slack, etc.
Model Providers
Dify supports multiple LLM providers:
Configure providers
- Settings → Model Providers
- Add providers and API keys:
- OpenAI: GPT-4, GPT-3.5-turbo
- Anthropic: Claude 3.5 Sonnet, Claude Opus
- Ollama: Self-hosted Llama, Mistral, etc.
- Google: Gemini Pro
- Cohere: Command models
- Local models: Llama.cpp, vLLM
Switch models per app
Each app can use a different model:
- Customer support chatbot: GPT-3.5-turbo (fast, cheap)
- Technical documentation Q&A: Claude Opus (deep reasoning)
- Internal tools: Self-hosted Llama via Ollama (privacy, no API costs)
Using Ollama for local models
Run Ollama on the same server or network:
services:
ollama:
image: ollama/ollama:latest
restart: unless-stopped
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
volumes:
ollama_data:
Pull models:
docker exec ollama ollama pull llama3.2
In Dify, add Ollama as a provider:
- Endpoint:
http://ollama:11434 - Model:
llama3.2
Now you can use local models without API costs.
Prompt Management
Dify includes a prompt editor with:
- Variables: Inject user input, context, timestamps
- Templates: Reusable prompt structures
- Versioning: Track changes and roll back
Example prompt for a customer support chatbot:
You are a support agent for {{company_name}}.
Current date: {{date}}
User tier: {{user_tier}}
Knowledge base context:
{{#context#}}
User message:
{{#query#}}
Respond professionally and helpfully. If the knowledge base doesn't contain the answer, offer to escalate to a human agent.
Variables are filled automatically at runtime.
API and Integration
Every Dify app exposes a REST API:
Chat completion
curl -X POST https://dify.yourdomain.com/v1/chat-messages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "How do I reset my password?",
"user": "user-123",
"conversation_id": "abc123"
}'
Response includes:
- answer: The LLM response
- metadata: Retrieved documents, tool calls, etc.
- conversation_id: For multi-turn conversations
Streaming responses
For real-time chatbot UI, use streaming:
curl -X POST https://dify.yourdomain.com/v1/chat-messages \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "Explain quantum computing",
"user": "user-123",
"stream": true
}'
Returns Server-Sent Events (SSE) with incremental chunks.
Embeddings
Use Dify's embedding API for similarity search:
curl -X POST https://dify.yourdomain.com/v1/embeddings \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "What is machine learning?"
}'
Resource Requirements
| Component | RAM | CPU | Storage |
|---|---|---|
| Dify API | 200-400 MB | Low (0.5 cores) | 100 MB |
| Dify Worker | 200-400 MB | Low (0.5 cores) | — |
| Dify Web | 50-100 MB | Minimal | 50 MB |
| PostgreSQL | 100-200 MB | Low | 200-500 MB |
| Redis | 50-100 MB | Minimal | 50 MB |
| Weaviate | 200-500 MB | Low | 500 MB-5 GB (depends on docs) |
| Total | ~800 MB-1.7 GB | ~1-2 cores | ~1-6 GB |
Storage for Weaviate depends on document volume:
- 1,000 documents (~500 KB each): ~500 MB
- 10,000 documents: ~2-3 GB
- 100,000 documents: ~10-20 GB
Production Considerations
Rate limiting
Configure per-app rate limits:
- App Settings → API Access → Rate Limiting
- Set requests per minute/hour
- Enforce at API gateway or in Dify
Monitoring
Dify logs all API calls:
- Logs → API Logs — view requests, latency, costs
- Logs → Conversations — replay chatbot interactions
- Logs → Annotations — flag and improve bad responses
Export logs to Prometheus, Grafana, or Loki for monitoring.
Cost tracking
Dify tracks token usage per app:
- Dashboard → Usage — view tokens, costs per provider
- Set budgets and alerts
Backup
- PostgreSQL:
pg_dumpthe database - Weaviate: Back up the volume (
dify_weaviate) - Storage: Back up uploaded documents (
dify_storage)
Honest Limitations
- Resource usage is higher than minimal tools: Dify runs multiple containers and needs 1-2 GB RAM
- Vector database required: You must run Weaviate or Qdrant (no lightweight SQLite option)
- Less flexible than code: Visual workflows are easier but can't express every LLM pattern
- Agent reliability varies: Agents work well with GPT-4/Claude but struggle with weaker models
- Learning curve: Dify has many features; onboarding takes time
For simple use cases (basic chatbot, single-prompt API), Dify is overkill. For production LLM apps with RAG, tools, and team collaboration, it's a time-saver.
The Bottom Line
Dify is the most production-ready open-source platform for building LLM applications. It has the polish of a commercial product (clean UI, good documentation, active development) while being fully self-hosted and free.
If you're building customer-facing chatbots, internal knowledge bases, or autonomous agents, Dify eliminates weeks of boilerplate code. If you're experimenting with LLM workflows and want maximum flexibility, Flowise or Langflow might fit better.
For teams that want to move fast, iterate with non-technical stakeholders, and deploy production APIs without managing infrastructure, Dify is the best option in the self-hosted space.
Resources
- Dify documentation
- Dify GitHub
- Community Discord
- Dify Cloud — try hosted version (or deploy self-hosted for privacy)
