Self-Hosting Paperless-ngx: Digitize and Organize Every Document You Own

Productivity 2026-02-08 paperless-ngx document-management ocr organization

Tax receipts. Insurance documents. Warranty cards. Medical records. That one letter from 2019 you might need someday.

If your approach to document management is "throw it in a folder and pray you can find it later," Paperless-ngx is the solution. It's a self-hosted document management system that ingests, OCRs, and automatically organizes your documents so you can search them in seconds.

What Paperless-ngx Actually Does

Drop a PDF, photo, or scanned document into Paperless-ngx, and it will:

OCR the content — Even scanned images become full-text searchable
Auto-classify — Machine learning suggests tags, correspondents, and document types
Store and index — Every document gets archived with metadata you can search later
Track dates — Detects document dates automatically
Deduplicate — Won't import the same document twice

The result: every document you've ever received becomes searchable by content, date, sender, or tag. Finding your 2024 property tax statement takes 5 seconds instead of 20 minutes.

Why Not Just Use Google Drive?

You could throw everything into Google Drive or Dropbox. Here's why Paperless-ngx is better for document management:

Feature	Google Drive / Dropbox	Paperless-ngx
Full-text search of scanned PDFs	Limited	Excellent (Tesseract OCR)
Automatic tagging	No	Yes (ML-powered)
Correspondent tracking	No	Built-in
Document type classification	No	Built-in
Date detection	No	Automatic
Duplicate detection	No	Built-in
Data ownership	Cloud provider	Your server
Monthly cost	$2-15/month	Free (self-hosted)

Google Drive is a general-purpose file store. Paperless-ngx is purpose-built for documents, and that specialization makes a huge difference when you have hundreds or thousands of files.

When cloud storage is fine

You have fewer than 50 documents total
You don't need to search inside scanned documents
You share documents frequently with others (Drive's sharing is excellent)
You don't want to maintain any infrastructure

Self-Hosting Paperless-ngx: Setup

Server requirements

Paperless-ngx is lightweight for what it does:

Minimum: 2 GB RAM, 1 vCPU
Recommended: 4 GB RAM, 2 vCPU (OCR is CPU-intensive during import)
Storage: Depends on your document volume. 10-20 GB covers most households.

Docker Compose setup

Paperless-ngx provides an official compose file. Here's a streamlined version:

services:
  paperless-broker:
    image: redis:7
    restart: unless-stopped

  paperless-db:
    image: postgres:16
    restart: unless-stopped
    environment:
      POSTGRES_DB: paperless
      POSTGRES_USER: paperless
      POSTGRES_PASSWORD: changeme
    volumes:
      - pgdata:/var/lib/postgresql/data

  paperless:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - paperless-db
      - paperless-broker
    ports:
      - "8000:8000"
    environment:
      PAPERLESS_REDIS: redis://paperless-broker:6379
      PAPERLESS_DBHOST: paperless-db
      PAPERLESS_DBPASS: changeme
      PAPERLESS_OCR_LANGUAGE: eng
      PAPERLESS_TIME_ZONE: America/Los_Angeles
    volumes:
      - ./data:/usr/src/paperless/data
      - ./media:/usr/src/paperless/media
      - ./consume:/usr/src/paperless/consume

volumes:
  pgdata:

docker compose up -d
docker compose exec paperless python3 manage.py createsuperuser

Open http://your-server:8000 and log in with the superuser account you just created.

The Intake Workflow

Scan-and-drop

The /consume directory is where the magic happens. Any file placed in this folder gets automatically:

OCR'd (if it's an image or scanned PDF)
Text-extracted (if it's a digital PDF)
Auto-tagged based on content
Date-detected
Archived and indexed

You can feed documents into this folder via:

Network scanner — Configure your scanner to save to a network share that maps to the consume directory
Email — Paperless-ngx can monitor an email inbox and import attachments
Mobile app — Scan with your phone and upload directly
Manual upload — Drag and drop through the web interface

Setting up automatic classification

Paperless-ngx uses a machine learning model that improves over time:

Start by manually tagging the first 20-30 documents — create tags like "tax," "medical," "insurance," "receipt"
Create correspondents for frequent senders — your bank, insurance company, employer
Define document types — invoice, letter, contract, receipt, statement
After enough training data, Paperless-ngx suggests these automatically for new documents

The model retrains periodically, so accuracy improves the more you use it.

Organizing Your Archives

Recommended tag structure

Keep it simple. Most households need:

By type: receipt, invoice, contract, statement, letter, tax-document, warranty, medical
By year: 2024, 2025, 2026
By status: action-required, archive

Don't over-tag. The full-text search is so good that you'll usually find documents by searching their content rather than browsing tags.

Correspondent examples

Your bank
Insurance provider
Employer
Utility companies
Government agencies (IRS, state tax authority)

Retention and cleanup

Paperless-ngx doesn't auto-delete anything. Once a year, review old documents:

Tax documents: Keep 7 years
Medical records: Keep indefinitely
Warranties: Delete when expired
Receipts for everyday purchases: Delete after 90 days unless needed for returns

Backup Strategy

Your Paperless-ngx instance contains important documents. Back it up:

Database: docker compose exec paperless-db pg_dump -U paperless paperless > backup.sql
Media files: The /media directory contains all original and archived documents
Configuration: The docker-compose file and any custom scripts

Use a tool like restic or borgbackup to automate nightly backups to an offsite location.

The Honest Trade-offs

Paperless-ngx is great if:

You have a growing pile of documents you can never find
You want full-text search across scanned documents
You value data ownership for sensitive personal documents
You're willing to spend an afternoon on initial setup and scanning

Paperless-ngx is not ideal if:

You have very few documents (under 50)
You need real-time collaborative editing (it's an archive, not Google Docs)
You want mobile-first scanning without any server setup (use a scanning app instead)

Bottom line: If you've ever lost track of an important document — a tax form, insurance card, or warranty — Paperless-ngx pays for itself almost immediately. The initial effort of scanning your existing documents is worth it. After that, the automated intake workflow keeps everything organized with zero ongoing effort.