Open WebUI: The Self-Hosted AI Interface That Does More Than Chat
If you have followed our guide on running Ollama locally, you already know Open WebUI as a chat interface for local models. But reducing it to "a ChatGPT skin for Ollama" misses most of what it does. Open WebUI has grown into a full AI platform -- a self-hosted gateway that connects to multiple backends, runs retrieval-augmented generation pipelines, supports tool calling, and provides granular user management. It is the closest thing to a self-hosted ChatGPT Teams deployment that actually works.

Beyond Ollama: Multi-Backend Support
The most underappreciated feature of Open WebUI is that it is not tied to Ollama. You can connect it to any OpenAI-compatible API endpoint, which means you can run a single Open WebUI instance that gives your team access to:
- Local Ollama models for privacy-sensitive work
- OpenAI GPT-4o / GPT-o1 for tasks that need frontier intelligence
- Anthropic Claude via an OpenAI-compatible proxy
- vLLM or TGI for high-throughput production serving
- Groq, Together AI, or OpenRouter for fast cloud inference
All of these show up in the same model dropdown. Your users pick the best model for their task without needing separate accounts or interfaces.
Configuring Multiple Backends
In the Admin panel under Settings > Connections, add each backend:
# Ollama (local)
URL: http://ollama:11434
# OpenAI
URL: https://api.openai.com/v1
API Key: sk-...
# Self-hosted vLLM
URL: http://vllm-server:8000/v1
API Key: token-if-needed
Each connection can have its own API key, and models from all backends appear in the unified model list.
RAG: Chat With Your Documents
Open WebUI's RAG (Retrieval-Augmented Generation) pipeline lets users upload documents and ask questions about them. This is not a toy demo -- it uses proper chunking, vector embeddings, and retrieval to ground model responses in your actual data.
How It Works
- Upload a PDF, markdown file, or text document to a conversation or a shared knowledge collection
- Open WebUI chunks the document and generates embeddings using a configurable embedding model
- When you ask a question, it retrieves relevant chunks and includes them in the model's context
- The model answers based on the retrieved content, with citations
Docker Compose With RAG Dependencies
services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
ports:
- "11434:11434"
restart: unless-stopped
open-webui:
image: ghcr.io/open-webui/open-webui:main
ports:
- "3000:8080"
volumes:
- open_webui_data:/app/backend/data
environment:
OLLAMA_BASE_URL: http://ollama:11434
# RAG configuration
RAG_EMBEDDING_MODEL: "nomic-embed-text:latest"
CHUNK_SIZE: "1000"
CHUNK_OVERLAP: "200"
RAG_TOP_K: "5"
# Optional: use OpenAI embeddings instead
# RAG_OPENAI_API_BASE_URL: https://api.openai.com/v1
# RAG_OPENAI_API_KEY: sk-...
# RAG_EMBEDDING_ENGINE: openai
# RAG_EMBEDDING_MODEL: text-embedding-3-small
depends_on:
- ollama
restart: unless-stopped
volumes:
ollama_data:
open_webui_data:
Pull the embedding model after startup:
docker exec ollama ollama pull nomic-embed-text:latest
Knowledge Collections
Beyond per-conversation uploads, you can create persistent Knowledge collections in the workspace. These are shared document sets that any conversation can reference. This is useful for team knowledge bases -- upload your internal docs, runbooks, or code documentation once, and every team member can query them.
User Management and RBAC
Open WebUI has a proper multi-user system with role-based access control:
- Admin: Full control, manages users, models, settings, and connections
- User: Standard access, can chat and use allowed models
- Pending: New sign-ups wait for admin approval (configurable)
Key Admin Settings
# Environment variables for user management
ENABLE_SIGNUP: "true" # Allow new registrations
DEFAULT_USER_ROLE: "pending" # Require admin approval
ENABLE_LOGIN_FORM: "true" # Show email/password login
WEBUI_AUTH: "true" # Require authentication
You can also configure OAuth/OIDC for SSO integration with Authentik, Keycloak, or any other identity provider:
ENABLE_OAUTH_SIGNUP: "true"
OAUTH_CLIENT_ID: "open-webui"
OAUTH_CLIENT_SECRET: "your-secret"
OAUTH_PROVIDER_NAME: "Authentik"
OPENID_PROVIDER_URL: "https://auth.example.com/application/o/open-webui/.well-known/openid-configuration"
This is what makes Open WebUI viable for teams. Each user gets their own conversation history, and admins control which models are available.
Like what you're reading? Subscribe to Self-Hosted Weekly — free weekly guides in your inbox.
Pipelines and Functions
Open WebUI's pipeline system lets you extend its behavior with custom Python functions. Pipelines sit between the user's message and the model, allowing you to:
- Filter content before it reaches the model (PII redaction, content moderation)
- Transform responses after the model generates them (formatting, citations)
- Add tools the model can call (web search, database queries, API calls)
- Build custom workflows that chain multiple models or processing steps
Example: Web Search Pipeline
Install the web search function from the community hub (accessible in the Admin panel), or write your own:
class Pipeline:
def __init__(self):
self.name = "Web Search"
async def pipe(self, body, __user__):
# Extract search queries from user message
# Call a search API
# Inject results into context
# Return augmented prompt to model
pass
The pipeline system is Open WebUI's most powerful feature and what separates it from simple chat wrappers.
Model Customization
Beyond selecting models, Open WebUI lets you create custom model profiles called Modelfiles. These combine a base model with:
- A system prompt that defines the model's persona or behavior
- Temperature and sampling parameters
- Custom knowledge documents attached by default
- Specific tool/function access
This lets you create purpose-built assistants -- a "Code Reviewer" that uses a coding model with strict formatting instructions, a "Research Assistant" that always searches the web, or a "Company FAQ Bot" that draws from your knowledge base.
Practical Deployment Tips
Reverse Proxy Configuration
Behind Nginx or Caddy, make sure WebSocket connections work. Open WebUI uses them for streaming responses:
# Caddy example
ai.example.com {
reverse_proxy open-webui:8080
}
Caddy handles WebSocket upgrades automatically. For Nginx, add the standard WebSocket proxy headers.
Persistent Storage
The /app/backend/data volume contains everything: user accounts, conversation history, uploaded documents, and vector embeddings. Back this up regularly. A corrupted or lost data volume means losing all conversations and RAG knowledge.
Resource Considerations
Open WebUI itself is lightweight -- the heavy lifting happens in Ollama or whatever backend you connect. The main resource consumers are:
- Embedding generation: When uploading large document sets for RAG, embedding computation can spike CPU/GPU usage temporarily
- Vector storage: Large knowledge bases increase disk usage in the data volume
- Concurrent users: Each active conversation maintains a WebSocket connection, but the actual compute bottleneck is the LLM backend
Updates
Open WebUI ships new features frequently. Update with:
docker compose pull open-webui
docker compose up -d open-webui
Check the changelog before major updates -- database migrations sometimes require attention.
When to Use Open WebUI vs. Alternatives
Open WebUI is best when you want a unified interface for multiple AI backends with team features. It is the right choice for small teams that want a private ChatGPT-like experience.
LibreChat is a strong alternative if you need more advanced conversation branching and preset management. It is also open source and supports multiple backends.
text-generation-webui (oobabooga) is better if you need deep control over model loading, quantization, and inference parameters. It is more of a power-user tool than a team platform.
AnythingLLM focuses more on the RAG and workspace angle, with built-in document management and agent capabilities.
The Bottom Line
Open WebUI has evolved from a simple Ollama frontend into the most capable self-hosted AI interface available. The combination of multi-backend support, RAG pipelines, user management, and custom functions makes it a legitimate platform for teams that want control over their AI tools. If you are already running Ollama, upgrading to a full Open WebUI deployment with RAG and multi-backend support takes about ten minutes and dramatically expands what you can do.
