PDF to HTML PHP Class
This PHP class can convert your pdf files to html using poppler-utils., (*1)
Thanks
Big thanks Mochamad Gufron (mgufrone)! I did a packet based on its package (https://github.com/mgufrone/pdf-to-html)., (*2)
Important Notes
Please see how to use below., (*3)
Installation
When you are in your active directory apps, you can just run this command to add this package on your app, (*4)
composer require tonchik-tm/pdf-to-html:~1
Or add this package to your composer.json
, (*5)
{
"tonchik-tm/pdf-to-html":"~1"
}
Requirements
1. Install Poppler-Utils
Debian/Ubuntu, (*6)
sudo apt-get install poppler-utils
Mac OS X, (*7)
brew install poppler
Windows, (*8)
For those who need this package in windows, there is a way. First download poppler-utils for windows here http://blog.alivate.com.au/poppler-windows/. And download the latest binary., (*9)
After download it, extract it., (*10)
2. We need to know where is utilities
Debian/Ubuntu, (*11)
$ whereis pdftohtml
pdftohtml: /usr/bin/pdftohtml
$ whereis pdfinfo
pdfinfo: /usr/bin/pdfinfo
Mac OS X, (*12)
$ which pdfinfo
/usr/local/bin/pdfinfo
$ which pdftohtml
/usr/local/bin/pdfinfo
Windows, (*13)
Go in extracted directory. There will be a directory called bin
. We will need this one., (*14)
3. PHP Configuration with shell access enabled
Usage
Example:, (*15)
<?php
// if you are using composer, just use this
include 'vendor/autoload.php';
// initiate
$pdf = new \TonchikTm\PdfToHtml\Pdf('test.pdf', [
'pdftohtml_path' => '/usr/bin/pdftohtml',
'pdfinfo_path' => '/usr/bin/pdfinfo'
]);
// example for windows
// $pdf = new \TonchikTm\PdfToHtml\Pdf('test.pdf', [
// 'pdftohtml_path' => '/path/to/poppler/bin/pdftohtml.exe',
// 'pdfinfo_path' => '/path/to/poppler/bin/pdfinfo.exe'
// ]);
// get pdf info
$pdfInfo = $pdf->getInfo();
// get count pages
$countPages = $pdf->countPages();
// get content from one page
$contentFirstPage = $pdf->getHtml()->getPage(1);
// get content from all pages and loop for they
foreach ($pdf->getHtml()->getAllPages() as $page) {
echo $page . '<br/>';
}
Full list settings:, (*16)
<?php
$full_settings = [
'pdftohtml_path' => '/usr/bin/pdftohtml', // path to pdftohtml
'pdfinfo_path' => '/usr/bin/pdfinfo', // path to pdfinfo
'generate' => [ // settings for generating html
'singlePage' => false, // we want separate pages
'imageJpeg' => false, // we want png image
'ignoreImages' => false, // we need images
'zoom' => 1.5, // scale pdf
'noFrames' => false, // we want separate pages
],
'clearAfter' => true, // auto clear output dir (if removeOutputDir==false then output dir will remain)
'removeOutputDir' => true, // remove output dir
'outputDir' => '/tmp/'.uniqid(), // output dir
'html' => [ // settings for processing html
'inlineCss' => true, // replaces css classes to inline css rules
'inlineImages' => true, // looks for images in html and replaces the src attribute to base64 hash
'onlyContent' => true, // takes from html body content only
]
]
Feedback & Contribute
Send me an issue for improvement or any buggy thing. I love to help and solve another people problems. Thanks :+1:, (*17)