What it does

This server extracts text, images, and metadata from PDF files with automatic parallel processing that achieves 5-10x speedup over sequential extraction. It preserves document layout through Y-coordinate-based content ordering and supports both absolute and relative paths on Windows and Unix systems. Built in TypeScript with 94% test coverage and minimal dependencies, it extracts full text at 5,575 ops/sec and handles 50-page PDFs in seconds.

Who it's for

Document analysts extracting text from scanned reports, software engineers integrating PDF processing into AI agent workflows, and data teams batch-processing large document collections.

Common use cases

Extract full text from a PDF for document analysis and indexing
Process specific page ranges from multi-page documents without loading the entire file
Retrieve PDF metadata (author, title, creation date) for cataloging without full extraction
Process multiple PDFs in parallel for comparative analysis or batch operations
Handle PDFs referenced by relative or absolute paths across different development and production environments

Setup pitfalls

Requires filesystem read and write access — verify permissions and consider sandboxing if processing untrusted PDFs
Windows paths (like C:\Users\...\file.pdf) need proper JSON escaping; v1.3.0+ handles normalization automatically
Parallel processing scales with available CPU cores; heavily constrain it on resource-limited systems to avoid memory exhaustion
Node.js must have write permissions to the output directory for operations that extract or write data

@sylphx/pdf-reader-mcp

What it does

Who it's for

Common use cases

Setup pitfalls