Open WebUI vs Text Generation WebUI: Self-Hosted AI Interfaces Compared
Running large language models locally is no longer a niche hobby. With consumer GPUs packing 16-24 GB of VRAM and quantized models fitting in 4-8 GB, anyone with a halfway decent machine can run AI models that rival cloud services. But the model is only half the story — you also need an interface to interact with it.
Photo by Andrew Neel on Unsplash
Two projects have emerged as the dominant self-hosted AI interfaces: Open WebUI (formerly Ollama WebUI) and Text Generation WebUI (commonly called oobabooga, after its creator's username). They both let you chat with local LLMs through a web browser, but they are designed for very different users with very different goals.

The Core Difference
Open WebUI is built for people who want a ChatGPT-like experience with local models. Clean interface, conversation management, document upload, web search integration, multi-model support. If you want to replace your ChatGPT subscription with something running on your own hardware, this is your tool.
Text Generation WebUI is built for people who want fine-grained control over model loading, inference parameters, and generation behavior. Multiple backend loaders, detailed parameter tuning, training/fine-tuning tools, and extension support. If you care about the difference between top_p=0.9 and top_p=0.95, or you need to load models in specific quantization formats, this is your tool.
Think of Open WebUI as the iPhone of local AI interfaces — polished, opinionated, just works. Text Generation WebUI is the Android — configurable, flexible, sometimes messy.
Feature Comparison
| Feature | Open WebUI | Text Generation WebUI |
|---|---|---|
| Primary focus | Chat experience | Model control & generation |
| Default backend | Ollama / OpenAI API | Multiple (llama.cpp, ExLlamaV2, Transformers, etc.) |
| Chat interface | Excellent (ChatGPT-style) | Good (functional) |
| Model switching | Seamless dropdown | Requires reload |
| Document upload (RAG) | Built-in | Via extensions |
| Web search | Built-in | Via extensions |
| Image generation | Via AUTOMATIC1111/ComfyUI integration | Via extensions |
| Voice input/output | Built-in (STT/TTS) | Via extensions |
| Multi-user support | Yes (admin panel) | Basic auth only |
| Conversation history | Full with search | Basic |
| Parameter control | Basic (temperature, top_p) | Extensive (50+ parameters) |
| Model loading options | Ollama handles it | GPTQ, AWQ, GGUF, EXL2, HQQ, etc. |
| Training/LoRA | No | Yes |
| API compatibility | OpenAI-compatible | OpenAI-compatible |
| Extensions/plugins | Community pipelines | Rich extension ecosystem |
| Docker deployment | Simple | Moderate |
| GPU requirements | Depends on backend | Depends on model/loader |
| Python dependency | No (Go + SvelteKit) | Heavy (PyTorch + transformers) |
Deploying Open WebUI
Open WebUI is designed to work with Ollama as its backend, though it also supports any OpenAI-compatible API.
With Ollama Backend
# docker-compose.yml
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
volumes:
- ollama_data:/root/.ollama
# For NVIDIA GPU:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
OLLAMA_BASE_URL: http://ollama:11434
WEBUI_SECRET_KEY: your-secret-key-here
volumes:
- open_webui_data:/app/backend/data
depends_on:
- ollama
volumes:
ollama_data:
open_webui_data:
docker compose up -d
# Pull a model
docker exec ollama ollama pull llama3.2
docker exec ollama ollama pull mistral
Navigate to http://your-server:3000, create an admin account, and start chatting. The first user to register becomes the administrator.
Without Ollama (OpenAI API Compatible)
If you are running vLLM, LocalAI, or any other OpenAI-compatible backend:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
environment:
OPENAI_API_BASE_URL: http://your-backend:8000/v1
OPENAI_API_KEY: your-api-key
WEBUI_SECRET_KEY: your-secret-key-here
volumes:
- open_webui_data:/app/backend/data
Like what you're reading? Subscribe to Self-Hosted Weekly — free weekly guides in your inbox.
Deploying Text Generation WebUI
Text Generation WebUI has a more involved setup because it bundles its own model loading infrastructure:
# docker-compose.yml
version: "3.8"
services:
text-gen-webui:
image: atinoda/text-generation-webui:default-nightly
container_name: text-gen-webui
restart: unless-stopped
ports:
- "7860:7860" # Web UI
- "5000:5000" # API
- "5005:5005" # Streaming API
environment:
- EXTRA_LAUNCH_ARGS=--listen --api
volumes:
- ./characters:/app/characters
- ./loras:/app/loras
- ./models:/app/models
- ./presets:/app/presets
- ./prompts:/app/prompts
- ./training:/app/training
- ./extensions:/app/extensions
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes: {}
mkdir -p characters loras models presets prompts training extensions
docker compose up -d
Downloading Models
Text Generation WebUI has a built-in model downloader in the UI, or you can download models manually:
# Using the built-in downloader (via UI)
# Go to the Model tab -> Download -> paste HuggingFace model name
# Or download manually
cd models
# Example: downloading a GGUF quantized model
wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf
The Chat Experience
Open WebUI
Open WebUI feels immediately familiar to anyone who has used ChatGPT. The left sidebar shows your conversation history with search. The main panel is a clean chat interface with markdown rendering, code syntax highlighting, and file attachments.
Key UX features:
- Model selector: Switch between any loaded Ollama model with a dropdown. No reloading, no waiting.
- System prompts: Create and save custom system prompts (personas) that you can apply to any conversation.
- Document RAG: Upload PDFs, text files, or web pages. Open WebUI chunks and embeds them, then uses retrieval-augmented generation to answer questions about the content.
- Web search: Toggle web search for conversations that need current information. Integrates with SearXNG, Google, Brave, and others.
- Artifacts: Code blocks get a "run" button for HTML/JS, and mathematical expressions render with LaTeX.
- Collaborative features: Share conversations with other users on the same instance.
Text Generation WebUI
The interface is more utilitarian. It is built with Gradio (a Python ML UI framework), which gives it a distinctive "research tool" look. There are multiple tabs:
- Chat: Conversation interface with character support
- Default: Raw text completion (no chat formatting)
- Notebook: Extended text editing and generation
- Parameters: The real power — dozens of generation parameters you can tune in real time
The chat mode supports character cards (like SillyTavern) with system prompts, example dialogues, and persona definitions. The Default and Notebook modes give you raw access to the model's completion capabilities without chat formatting, which is essential for creative writing, code generation, and other non-conversational tasks.
Model Loading and Backends
This is where Text Generation WebUI significantly outpaces Open WebUI.
Open WebUI (via Ollama)
Ollama handles model management transparently. You pull models with ollama pull, and they just work. Ollama uses llama.cpp under the hood, which means it supports GGUF quantized models. GPU offloading is automatic.
This simplicity is both the strength and limitation. You cannot:
- Load models in other formats (GPTQ, AWQ, EXL2)
- Control layer-by-layer GPU offloading
- Use custom quantization settings
- Mix different inference backends
Text Generation WebUI
Text Generation WebUI supports multiple loaders, each with different strengths:
| Loader | Format | Speed | Memory | Best For |
|---|---|---|---|---|
| llama.cpp | GGUF | Good | Excellent | CPU + partial GPU |
| ExLlamaV2 | EXL2, GPTQ | Excellent | Good | Full GPU inference |
| Transformers | FP16, BF16 | Moderate | High | Maximum compatibility |
| AutoGPTQ | GPTQ | Good | Good | GPTQ models |
| AutoAWQ | AWQ | Good | Good | AWQ models |
| HQQ | HQQ | Good | Good | New quantization |
The practical impact: if you have a 12 GB GPU and want to run a 13B parameter model, Text Generation WebUI lets you choose between a GGUF 4-bit quantization (runs on CPU+GPU), an EXL2 4-bit quantization (runs entirely on GPU, faster), or a GPTQ quantization (GPU, good compatibility). Each produces different quality/speed tradeoffs that you can evaluate for your specific use case.
Parameter Control
Open WebUI
Basic but sufficient for most users:
- Temperature
- Top P
- Top K
- Max tokens
- Repeat penalty
These are accessible from the chat settings and cover the parameters that actually matter for day-to-day use.
Text Generation WebUI
Extensive parameter control for those who need it:
Temperature, Top P, Top K, Typical P, Min P,
Repetition Penalty, Frequency Penalty, Presence Penalty,
Repetition Penalty Range, Encoder Repetition Penalty,
No Repeat N-gram Size, Mirostat (mode, tau, eta),
DRY (multiplier, base, allowed length, sequence breakers),
Top A, Epsilon Cutoff, Eta Cutoff, Smoothing Factor,
Temperature Last, Dynamic Temperature (low, high, exponent),
Seed, Context Length, Max New Tokens, Truncation Length,
Ban EOS Token, Add BOS Token, Skip Special Tokens,
Grammar (GBNF), Guidance Scale, Negative Prompt
If you know what these do, Text Generation WebUI is indispensable. If you do not, Open WebUI's defaults are fine and you are not missing anything for normal chat use.
Resource Requirements
| Requirement | Open WebUI + Ollama | Text Generation WebUI |
|---|---|---|
| RAM (UI only) | ~500 MB | ~2-4 GB |
| VRAM (7B model) | 4-6 GB | 4-6 GB |
| VRAM (13B model) | 8-10 GB | 6-10 GB (format dependent) |
| Disk (UI) | ~1 GB | ~5-10 GB |
| CPU inference | Supported (Ollama) | Supported (llama.cpp) |
| Docker image size | ~1 GB + ~500 MB (Ollama) | ~5-15 GB |
Open WebUI is significantly lighter on the UI side because it delegates model management to Ollama (a Go binary) rather than bundling the entire PyTorch ecosystem.
Multi-User and Security
Open WebUI has proper multi-user support:
- User registration with admin approval
- Role-based access (admin, user)
- Per-user conversation history
- Model access controls
- SSO support (OAuth, OIDC)
Text Generation WebUI has minimal user management:
- Optional basic authentication (
--gradio-auth user:password) - No per-user conversation separation
- No role-based access
- Designed as a single-user tool
If you are deploying for a household or small team, Open WebUI is the only real option.
Who Should Use What
Choose Open WebUI if:
- You want a ChatGPT replacement for daily use
- Multiple people will use the instance
- You value a polished, intuitive interface
- Document upload and web search are important
- You want the simplest deployment and maintenance
- Ollama's model support covers your needs
Choose Text Generation WebUI if:
- You experiment with different model formats and quantizations
- You need fine-grained control over generation parameters
- You are doing model evaluation or benchmarking
- You want to fine-tune or train LoRA adapters
- You use character cards or roleplay scenarios
- You need raw text completion (not just chat)
- You are a researcher or developer working on LLM applications
Use both if:
- You run Open WebUI for daily chat and Text Generation WebUI for experimentation. They can share the same GPU (not simultaneously) and coexist on the same server.
Running Both on One Server
If you want both tools available, stagger their GPU usage:
# Open WebUI + Ollama on port 3000 (daily use)
# Text Generation WebUI on port 7860 (experimentation)
# Share the same GPU, but only run one inference at a time
In practice, Ollama is good at releasing GPU memory when idle, so you can chat with Open WebUI, then switch to Text Generation WebUI for parameter tuning, and they will coexist without conflict — as long as you are not generating with both simultaneously.
Final Thoughts
The local AI interface space is maturing rapidly. Open WebUI has become the de facto standard for anyone who wants "ChatGPT but local" — it is polished, feature-rich, and absurdly easy to deploy. Text Generation WebUI remains essential for power users who need the control and flexibility that a streamlined chat interface deliberately hides.
Both projects are actively developed with frequent releases. Both have large, helpful communities. And both are free, open-source software that keeps your AI conversations entirely on your own hardware.
For most self-hosters, start with Open WebUI and Ollama. If you find yourself wanting to experiment with different model formats, tweak generation parameters, or do any kind of model development, add Text Generation WebUI alongside it. The two tools complement each other perfectly.
