smalot/pdfparser is a standalone PHP package that provides various tools to extract data from PDF files., (*2)
This library is under active maintenance.
There is no active development by the author of this library (at the moment), but we welcome any pull request adding/extending functionality!, (*3)
- Load/parse objects and headers
- Extract metadata (author, description, ...)
- Extract text from ordered pages
- Support of compressed PDFs
- Support of MAC OS Roman charset encoding
- Handling of hexa and octal encoding in text sections
- Create custom configurations (see CustomConfig.md).
Currently, secured documents and extracting form data are not supported., (*4)
This library is under the LGPLv3 license., (*5)
This library requires PHP 7.1+ since v1.
You can install it via Composer:, (*6)
composer require smalot/pdfparser
In case you can't use Composer, you can include
alt_autoload.php-dist. It will include all required files automatically., (*7)
// Parse PDF file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('/path/to/document.pdf');
$text = $pdf->getText();
Further usage information can be found here., (*8)
Documentation can be found in the doc folder., (*9)