Open WebUI: The Self-Hosted AI Interface That Does More Than Chat

Development 2026-02-14 · 5 min read open-webui llm ai rag self-hosted-ai chatgpt-alternative
By Selfhosted Guides Editorial Team — Self-hosting practitioners covering open source software, home lab infrastructure, and data sovereignty.

If you have followed our guide on running Ollama locally, you already know Open WebUI as a chat interface for local models. But reducing it to "a ChatGPT skin for Ollama" misses most of what it does. Open WebUI has grown into a full AI platform -- a self-hosted gateway that connects to multiple backends, runs retrieval-augmented generation pipelines, supports tool calling, and provides granular user management. It is the closest thing to a self-hosted ChatGPT Teams deployment that actually works.

Photo by P. L. on Unsplash

Beyond Ollama: Multi-Backend Support

The most underappreciated feature of Open WebUI is that it is not tied to Ollama. You can connect it to any OpenAI-compatible API endpoint, which means you can run a single Open WebUI instance that gives your team access to:

Local Ollama models for privacy-sensitive work
OpenAI GPT-4o / GPT-o1 for tasks that need frontier intelligence
Anthropic Claude via an OpenAI-compatible proxy
vLLM or TGI for high-throughput production serving
Groq, Together AI, or OpenRouter for fast cloud inference

All of these show up in the same model dropdown. Your users pick the best model for their task without needing separate accounts or interfaces.

Configuring Multiple Backends

In the Admin panel under Settings > Connections, add each backend:

# Ollama (local)
URL: http://ollama:11434

# OpenAI
URL: https://api.openai.com/v1
API Key: sk-...

# Self-hosted vLLM
URL: http://vllm-server:8000/v1
API Key: token-if-needed

Each connection can have its own API key, and models from all backends appear in the unified model list.

RAG: Chat With Your Documents

Open WebUI's RAG (Retrieval-Augmented Generation) pipeline lets users upload documents and ask questions about them. This is not a toy demo -- it uses proper chunking, vector embeddings, and retrieval to ground model responses in your actual data.

How It Works

Upload a PDF, markdown file, or text document to a conversation or a shared knowledge collection
Open WebUI chunks the document and generates embeddings using a configurable embedding model
When you ask a question, it retrieves relevant chunks and includes them in the model's context
The model answers based on the retrieved content, with citations

Docker Compose With RAG Dependencies

services:
  ollama:
    image: ollama/ollama:latest
    volumes:
      - ollama_data:/root/.ollama
    ports:
      - "11434:11434"
    restart: unless-stopped

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    volumes:
      - open_webui_data:/app/backend/data
    environment:
      OLLAMA_BASE_URL: http://ollama:11434
      # RAG configuration
      RAG_EMBEDDING_MODEL: "nomic-embed-text:latest"
      CHUNK_SIZE: "1000"
      CHUNK_OVERLAP: "200"
      RAG_TOP_K: "5"
      # Optional: use OpenAI embeddings instead
      # RAG_OPENAI_API_BASE_URL: https://api.openai.com/v1
      # RAG_OPENAI_API_KEY: sk-...
      # RAG_EMBEDDING_ENGINE: openai
      # RAG_EMBEDDING_MODEL: text-embedding-3-small
    depends_on:
      - ollama
    restart: unless-stopped

volumes:
  ollama_data:
  open_webui_data:

Pull the embedding model after startup:

docker exec ollama ollama pull nomic-embed-text:latest

Knowledge Collections

Beyond per-conversation uploads, you can create persistent Knowledge collections in the workspace. These are shared document sets that any conversation can reference. This is useful for team knowledge bases -- upload your internal docs, runbooks, or code documentation once, and every team member can query them.

User Management and RBAC

Open WebUI has a proper multi-user system with role-based access control:

Admin: Full control, manages users, models, settings, and connections
User: Standard access, can chat and use allowed models
Pending: New sign-ups wait for admin approval (configurable)

Key Admin Settings

# Environment variables for user management
ENABLE_SIGNUP: "true"          # Allow new registrations
DEFAULT_USER_ROLE: "pending"   # Require admin approval
ENABLE_LOGIN_FORM: "true"      # Show email/password login
WEBUI_AUTH: "true"             # Require authentication

You can also configure OAuth/OIDC for SSO integration with Authentik, Keycloak, or any other identity provider:

ENABLE_OAUTH_SIGNUP: "true"
OAUTH_CLIENT_ID: "open-webui"
OAUTH_CLIENT_SECRET: "your-secret"
OAUTH_PROVIDER_NAME: "Authentik"
OPENID_PROVIDER_URL: "https://auth.example.com/application/o/open-webui/.well-known/openid-configuration"

This is what makes Open WebUI viable for teams. Each user gets their own conversation history, and admins control which models are available.

Want more development guides? Get guides like this in your inbox — Self-Hosted Weekly delivers one free deep-dive every week.

Pipelines and Functions

Open WebUI's pipeline system lets you extend its behavior with custom Python functions. Pipelines sit between the user's message and the model, allowing you to:

Filter content before it reaches the model (PII redaction, content moderation)
Transform responses after the model generates them (formatting, citations)
Add tools the model can call (web search, database queries, API calls)
Build custom workflows that chain multiple models or processing steps

Example: Web Search Pipeline

Install the web search function from the community hub (accessible in the Admin panel), or write your own:

class Pipeline:
    def __init__(self):
        self.name = "Web Search"

    async def pipe(self, body, __user__):
        # Extract search queries from user message
        # Call a search API
        # Inject results into context
        # Return augmented prompt to model
        pass

The pipeline system is Open WebUI's most powerful feature and what separates it from simple chat wrappers.

Model Customization

Beyond selecting models, Open WebUI lets you create custom model profiles called Modelfiles. These combine a base model with:

A system prompt that defines the model's persona or behavior
Temperature and sampling parameters
Custom knowledge documents attached by default
Specific tool/function access

This lets you create purpose-built assistants -- a "Code Reviewer" that uses a coding model with strict formatting instructions, a "Research Assistant" that always searches the web, or a "Company FAQ Bot" that draws from your knowledge base.

Practical Deployment Tips

Reverse Proxy Configuration

Behind Nginx or Caddy, make sure WebSocket connections work. Open WebUI uses them for streaming responses:

# Caddy example
ai.example.com {
    reverse_proxy open-webui:8080
}

Caddy handles WebSocket upgrades automatically. For Nginx, add the standard WebSocket proxy headers.

Persistent Storage

The /app/backend/data volume contains everything: user accounts, conversation history, uploaded documents, and vector embeddings. Back this up regularly. A corrupted or lost data volume means losing all conversations and RAG knowledge.

Resource Considerations

Open WebUI itself is lightweight -- the heavy lifting happens in Ollama or whatever backend you connect. The main resource consumers are:

Embedding generation: When uploading large document sets for RAG, embedding computation can spike CPU/GPU usage temporarily
Vector storage: Large knowledge bases increase disk usage in the data volume
Concurrent users: Each active conversation maintains a WebSocket connection, but the actual compute bottleneck is the LLM backend

Updates

Open WebUI ships new features frequently. Update with:

docker compose pull open-webui
docker compose up -d open-webui

Check the changelog before major updates -- database migrations sometimes require attention.

When to Use Open WebUI vs. Alternatives

Open WebUI is best when you want a unified interface for multiple AI backends with team features. It is the right choice for small teams that want a private ChatGPT-like experience.

LibreChat is a strong alternative if you need more advanced conversation branching and preset management. It is also open source and supports multiple backends.

text-generation-webui (oobabooga) is better if you need deep control over model loading, quantization, and inference parameters. It is more of a power-user tool than a team platform.

AnythingLLM focuses more on the RAG and workspace angle, with built-in document management and agent capabilities.

The Bottom Line

Open WebUI has evolved from a simple Ollama frontend into the most capable self-hosted AI interface available. The combination of multi-backend support, RAG pipelines, user management, and custom functions makes it a legitimate platform for teams that want control over their AI tools. If you are already running Ollama, upgrading to a full Open WebUI deployment with RAG and multi-backend support takes about ten minutes and dramatically expands what you can do.