, (*1)
PDF to HTML PHP Class
This class brought to you so you can use php and poppler-utils convert your pdf files to html file, (*2)
Important Notes
Please see how to use below, since it's really upgraded and things in this package has already changed., (*3)
Installation
When you are in your active directory apps, you can just run this command to add this package on your app, (*4)
composer require gufy/pdftohtml-php:~2
Or add this package to your composer.json, (*5)
{
"gufy/pdftohtml-php":"~2"
}
Requirements
- Poppler-Utils (if you are using Ubuntu Distro, just install it from apt )
sudo apt-get install poppler-utils
- PHP Configuration with shell access enabled
Usage
Here is the sample., (*6)
html();
// convert a specific page to html string
$page = $pdf->html(3);
// convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser)
$dom = $pdf->getDom();
// check if your pdf has more than one pages
$total_pages = $pdf->getPages();
// Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3
$dom->goToPage(3);
// and then you can do as you please with that dom, you can find any element you want
$paragraphs = $dom->find('body > p');
// change pdftohtml bin location
\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');
// change pdfinfo bin location
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');
?>
Passing options to getDOM
By default getDom() extracts all images and creates a html file per page. You can pass options when extracting html:, (*7)
<?php
$pdfDom = $pdf->getDom(['ignoreImages' => true]);
Available Options
- singlePage, default: false
- imageJpeg, default: false
- ignoreImages, default: false
- zoom, default: 1.5
- noFrames, default: true
Usage note for Windows Users
For those who need this package in windows, there is a way. First download poppler-utils for windows here http://blog.alivate.com.au/poppler-windows/. And download the latest binary., (*8)
After download it, extract it. There will be a directory called bin. We will need this one. Then change your code like this, (*9)
html();
// check if your pdf has more than one pages
$total_pages = $pdf->getPages();
// Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3
$html->goToPage(3);
// and then you can do as you please with that dom, you can find any element you want
$paragraphs = $html->find('body > p');
?>
Usage note for OS/X Users
Thanks to @kaleidoscopique for giving a try and make it run on OS/X for this package, (*10)
1. Install brew, (*11)
Brew is a famous package manager on OS/X : http://brew.sh/ (aptitude style)., (*12)
2. Install poppler, (*13)
brew install poppler
3. Verify the path of pdfinfo and pdftohtml, (*14)
$ which pdfinfo
/usr/local/bin/pdfinfo
$ which pdftohtml
/usr/local/bin/pdfinfo
4. Whatever the paths are, use Gufy\PdfToHtml\Config::set to set them in your php code. Obviously, use the same path as the one given by the which command;, (*15)
html();
?>
Feedback & Contribute
Send me an issue for improvement or any buggy thing. I love to help and solve another people problems. Thanks :+1:, (*16)