Stirling-PDF Advanced Features: Beyond Basic PDF Tools

Productivity 2026-03-04 · 4 min read stirling-pdf pdf ocr self-hosted docker document management open-source
By Selfhosted Guides Editorial Team — Self-hosting practitioners covering open source software, home lab infrastructure, and data sovereignty.

Most people discover Stirling-PDF for basic operations: merge, split, compress, convert. These work excellently, but Stirling-PDF has depth beyond the basics. This guide covers the less-obvious but powerful features.

Photo by William Warby on Unsplash

Recap: What Stirling-PDF Is

Stirling-PDF is a self-hosted web application providing 50+ PDF operations. It runs entirely locally — no files leave your server. Operations include: merge, split, compress, rotate, convert (Office docs, images, HTML), OCR, watermark, crop, extract pages, reorder, annotate, sign, and more.

Docker Setup

If you haven't deployed it yet:

services:
  stirling-pdf:
    image: frooodle/s-pdf:latest
    container_name: stirling-pdf
    restart: unless-stopped
    ports:
      - 8080:8080
    volumes:
      - ./config:/configs
      - ./training-data:/usr/share/tessdata  # OCR language data
      - ./customFiles:/customFiles
    environment:
      DOCKER_ENABLE_SECURITY: "false"
      INSTALL_BOOK_AND_ADVANCED_HTML_CONVERSION: "false"

Set DOCKER_ENABLE_SECURITY: "true" to enable login authentication if the instance is network-accessible.

OCR: Making PDFs Searchable

OCR converts scanned images (PDFs that are essentially photos of documents) into searchable and selectable text.

Navigate to: Convert → PDF to PDF/OCR (or Operations → OCR)

Options:

Language: Select the document's language (requires tessdata for that language — see below)
OCR Type: "Add text layer" (keeps original image, adds hidden searchable text) vs "OCR only" (replaces pages with text)
Deskew: Correct slight rotation in scanned documents
Clean: Basic image cleanup before OCR

Installing additional languages:

OCR uses Tesseract. The Docker image includes English by default. For other languages:

# Inside the container or via Docker exec
apt-get install tesseract-ocr-deu  # German
apt-get install tesseract-ocr-fra  # French
apt-get install tesseract-ocr-spa  # Spanish

Or mount tessdata files directly:

# Download language files from GitHub/tessdata
wget https://github.com/tesseract-ocr/tessdata/raw/main/deu.traineddata
# Place in ./training-data/ directory

Want more productivity guides? Get guides like this in your inbox — Self-Hosted Weekly delivers one free deep-dive every week.

PDF/A Conversion for Long-Term Archiving

PDF/A is an ISO standard for long-term document preservation. It embeds all fonts, prohibits JavaScript and external references, and ensures the document renders identically regardless of software.

Navigate to: Convert → PDF to PDF/A

Use cases: archiving contracts, legal documents, medical records, government forms — anything you need to be readable in 20-30 years.

PDF/A-2B is the most common target format; PDF/A-3B is newer and supports attachments.

Flatten Forms and Annotations

PDFs with form fields can have their fields flattened — converting interactive fields to static text. This creates a non-editable version that looks like a filled form.

Navigate to: Operations → Flatten (Annotations/Form Fields)

Useful for: submitting completed forms via email where you want them non-editable, archiving filled forms, sending finalized documents.

Redact (Remove) Sensitive Content

Stirling-PDF supports redacting specific text from a PDF. Unlike "drawing a black box" over text in a PDF viewer (which is reversible), proper redaction removes the underlying text data.

Navigate to: Operations → Remove Content (Redact)

Enter the text strings to redact. All instances are removed from the document. This is important for removing SSNs, account numbers, or other sensitive data before sharing documents.

Split by Content

Beyond basic page splitting, Stirling-PDF can split a PDF:

By page number ranges: Pages 1-5, 6-10, etc.
By each page: Every page becomes a separate PDF
By size: Split when accumulated pages exceed a file size
By chapter/bookmark: Uses the document's bookmark structure

Navigate to: Organize → Split by Pages or Split PDF

Compress with Quality Control

PDF compression has several modes:

Navigate to: Transform → Compress PDF

Options:

Low/Medium/High compression: Tradeoff between file size and quality
DPI reduction: Downsample embedded images to a target resolution
Compress before/after other operations: Chaining operations

For scanned PDFs with high-resolution images, compressing at 150 DPI is often sufficient for screen viewing and dramatically reduces file size.

API Access for Automation

Stirling-PDF has a full REST API. This enables automation:

# Merge multiple PDFs via API
curl -X POST \
  -F '[email protected]' \
  -F '[email protected]' \
  http://your-server:8080/api/v1/general/merge-pdfs \
  --output merged.pdf

# Compress a PDF
curl -X POST \
  -F '[email protected]' \
  -F 'optimizeLevel=3' \
  http://your-server:8080/api/v1/misc/compress-pdf \
  --output compressed.pdf

# Run OCR on a scanned PDF
curl -X POST \
  -F '[email protected]' \
  -F 'languages=eng' \
  http://your-server:8080/api/v1/misc/add-ocr-pdf \
  --output searchable.pdf

The Swagger documentation for all API endpoints is available at http://your-server:8080/swagger-ui/index.html.

Automation with n8n or workflows: Chain Stirling-PDF API calls in n8n or shell scripts to build document processing pipelines — automatically OCR incoming PDFs, compress files over a size threshold, or batch convert Office documents.

Security Configuration

If your Stirling-PDF instance is accessible beyond your localhost:

environment:
  DOCKER_ENABLE_SECURITY: "true"
  SECURITY_ENABLELOGIN: "true"
  SECURITY_INITIALLOGIN_USERNAME: admin
  SECURITY_INITIALLOGIN_PASSWORD: change-this

With security enabled, users must log in. Admin accounts can create additional users.

For team use, restrict access by IP range at the reverse proxy level rather than relying solely on application-level auth.

Integrating with Paperless-NGX

If you use Paperless-NGX for document management, Stirling-PDF can preprocess documents before ingestion:

Use Stirling-PDF to OCR scanned PDFs (Paperless will use the text layer for search)
Compress large PDFs before adding to Paperless to reduce storage
Merge related documents before they're filed

You can script this with Stirling-PDF's API and Paperless's watch folder.

Custom Branding (For Organization/Team Use)

Stirling-PDF supports custom branding:

environment:
  APP_NAME: "Your Org PDF Tools"
  HOME_PAGE_DISPLAYED: "true"

Mount custom CSS or logo in ./customFiles/static/ to override the default appearance.

The project is at Frooodle/Stirling-PDF with active development. The API documentation and feature list expand regularly.