>_

pdftotext

agent-ready non-interactive

Extracts text from PDF files with layout preservation. Part of the poppler-utils package, widely available on all platforms.

How to install pdftotext

brew install poppler

When to use pdftotext

  • Extracting structured text from PDFs that have complex layouts (e.g., multi-column documents, brochures) where preserving the reading order and spatial arrangement is important.
  • Selectively extracting text from a specific range of pages (e.g., only chapters 3-5 of a book) to reduce processing time or isolate relevant content.
  • Extracting text from password-protected PDFs when the correct password is known, as pdftotext supports encrypted PDFs.

When not to use pdftotext

  • Processing scanned PDFs that consist of images rather than embedded text, because pdftotext does not perform OCR and will return empty or garbage output.
  • Extracting tabular data where perfect cell-by-cell accuracy is required; while pdftotext preserves layout, it does not understand table structure and may mix content across rows/columns.

pdftotext features

  • · Layout preservation
  • · Page range selection
  • · Bounding box extraction
  • · Encrypted PDF support
  • · Fast C-based processing

Want your agent to find this automatically?

Add the MCP server to your agent config and it will discover tools like pdftotext on its own.

Set up MCP →