Self-Hosting Paperless-ngx: Digitize and Organize Every Document You Own
Tax receipts. Insurance documents. Warranty cards. Medical records. That one letter from 2019 you might need someday.
If your approach to document management is "throw it in a folder and pray you can find it later," Paperless-ngx is the solution. It's a self-hosted document management system that ingests, OCRs, and automatically organizes your documents so you can search them in seconds.
What Paperless-ngx Actually Does
Drop a PDF, photo, or scanned document into Paperless-ngx, and it will:
- OCR the content — Even scanned images become full-text searchable
- Auto-classify — Machine learning suggests tags, correspondents, and document types
- Store and index — Every document gets archived with metadata you can search later
- Track dates — Detects document dates automatically
- Deduplicate — Won't import the same document twice
The result: every document you've ever received becomes searchable by content, date, sender, or tag. Finding your 2024 property tax statement takes 5 seconds instead of 20 minutes.
Why Not Just Use Google Drive?
You could throw everything into Google Drive or Dropbox. Here's why Paperless-ngx is better for document management:
| Feature | Google Drive / Dropbox | Paperless-ngx |
|---|---|---|
| Full-text search of scanned PDFs | Limited | Excellent (Tesseract OCR) |
| Automatic tagging | No | Yes (ML-powered) |
| Correspondent tracking | No | Built-in |
| Document type classification | No | Built-in |
| Date detection | No | Automatic |
| Duplicate detection | No | Built-in |
| Data ownership | Cloud provider | Your server |
| Monthly cost | $2-15/month | Free (self-hosted) |
Google Drive is a general-purpose file store. Paperless-ngx is purpose-built for documents, and that specialization makes a huge difference when you have hundreds or thousands of files.
When cloud storage is fine
- You have fewer than 50 documents total
- You don't need to search inside scanned documents
- You share documents frequently with others (Drive's sharing is excellent)
- You don't want to maintain any infrastructure
Self-Hosting Paperless-ngx: Setup
Server requirements
Paperless-ngx is lightweight for what it does:
- Minimum: 2 GB RAM, 1 vCPU
- Recommended: 4 GB RAM, 2 vCPU (OCR is CPU-intensive during import)
- Storage: Depends on your document volume. 10-20 GB covers most households.
Docker Compose setup
Paperless-ngx provides an official compose file. Here's a streamlined version:
services:
paperless-broker:
image: redis:7
restart: unless-stopped
paperless-db:
image: postgres:16
restart: unless-stopped
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: changeme
volumes:
- pgdata:/var/lib/postgresql/data
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- paperless-db
- paperless-broker
ports:
- "8000:8000"
environment:
PAPERLESS_REDIS: redis://paperless-broker:6379
PAPERLESS_DBHOST: paperless-db
PAPERLESS_DBPASS: changeme
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_TIME_ZONE: America/Los_Angeles
volumes:
- ./data:/usr/src/paperless/data
- ./media:/usr/src/paperless/media
- ./consume:/usr/src/paperless/consume
volumes:
pgdata:
docker compose up -d
docker compose exec paperless python3 manage.py createsuperuser
Open http://your-server:8000 and log in with the superuser account you just created.
The Intake Workflow
Scan-and-drop
The /consume directory is where the magic happens. Any file placed in this folder gets automatically:
- OCR'd (if it's an image or scanned PDF)
- Text-extracted (if it's a digital PDF)
- Auto-tagged based on content
- Date-detected
- Archived and indexed
You can feed documents into this folder via:
- Network scanner — Configure your scanner to save to a network share that maps to the consume directory
- Email — Paperless-ngx can monitor an email inbox and import attachments
- Mobile app — Scan with your phone and upload directly
- Manual upload — Drag and drop through the web interface
Setting up automatic classification
Paperless-ngx uses a machine learning model that improves over time:
- Start by manually tagging the first 20-30 documents — create tags like "tax," "medical," "insurance," "receipt"
- Create correspondents for frequent senders — your bank, insurance company, employer
- Define document types — invoice, letter, contract, receipt, statement
- After enough training data, Paperless-ngx suggests these automatically for new documents
The model retrains periodically, so accuracy improves the more you use it.
Organizing Your Archives
Recommended tag structure
Keep it simple. Most households need:
- By type: receipt, invoice, contract, statement, letter, tax-document, warranty, medical
- By year: 2024, 2025, 2026
- By status: action-required, archive
Don't over-tag. The full-text search is so good that you'll usually find documents by searching their content rather than browsing tags.
Correspondent examples
- Your bank
- Insurance provider
- Employer
- Utility companies
- Government agencies (IRS, state tax authority)
Retention and cleanup
Paperless-ngx doesn't auto-delete anything. Once a year, review old documents:
- Tax documents: Keep 7 years
- Medical records: Keep indefinitely
- Warranties: Delete when expired
- Receipts for everyday purchases: Delete after 90 days unless needed for returns
Backup Strategy
Your Paperless-ngx instance contains important documents. Back it up:
- Database:
docker compose exec paperless-db pg_dump -U paperless paperless > backup.sql - Media files: The
/mediadirectory contains all original and archived documents - Configuration: The docker-compose file and any custom scripts
Use a tool like restic or borgbackup to automate nightly backups to an offsite location.
The Honest Trade-offs
Paperless-ngx is great if:
- You have a growing pile of documents you can never find
- You want full-text search across scanned documents
- You value data ownership for sensitive personal documents
- You're willing to spend an afternoon on initial setup and scanning
Paperless-ngx is not ideal if:
- You have very few documents (under 50)
- You need real-time collaborative editing (it's an archive, not Google Docs)
- You want mobile-first scanning without any server setup (use a scanning app instead)
Bottom line: If you've ever lost track of an important document — a tax form, insurance card, or warranty — Paperless-ngx pays for itself almost immediately. The initial effort of scanning your existing documents is worth it. After that, the automated intake workflow keeps everything organized with zero ongoing effort.