2017 © Pedro Peláez
 

library pdftohtml-php

PDF to HTML converter with PHP using Poppler-utils

image

gufy/pdftohtml-php

PDF to HTML converter with PHP using Poppler-utils

  • Monday, March 6, 2017
  • by mgufron
  • Repository
  • 11 Watchers
  • 105 Stars
  • 29,424 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 52 Forks
  • 27 Open issues
  • 10 Versions
  • 14 % Grown

The README.md

Build Status Coverage Status, (*1)

PDF to HTML PHP Class

This class brought to you so you can use php and poppler-utils convert your pdf files to html file, (*2)

Important Notes

Please see how to use below, since it's really upgraded and things in this package has already changed., (*3)

Installation

When you are in your active directory apps, you can just run this command to add this package on your app, (*4)

    composer require gufy/pdftohtml-php:~2

Or add this package to your composer.json, (*5)

{
    "gufy/pdftohtml-php":"~2"
}

Requirements

  1. Poppler-Utils (if you are using Ubuntu Distro, just install it from apt ) sudo apt-get install poppler-utils
  2. PHP Configuration with shell access enabled

Usage

Here is the sample., (*6)

html();

// convert a specific page to html string
$page = $pdf->html(3);

// convert to html and return it as [Dom Object](https://github.com/paquettg/php-html-parser)
$dom = $pdf->getDom();

// check if your pdf has more than one pages
$total_pages = $pdf->getPages();

// Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3
$dom->goToPage(3);

// and then you can do as you please with that dom, you can find any element you want
$paragraphs = $dom->find('body > p');

// change pdftohtml bin location
\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');

// change pdfinfo bin location
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');
?>

Passing options to getDOM

By default getDom() extracts all images and creates a html file per page. You can pass options when extracting html:, (*7)

<?php
$pdfDom = $pdf->getDom(['ignoreImages' => true]);

Available Options

  • singlePage, default: false
  • imageJpeg, default: false
  • ignoreImages, default: false
  • zoom, default: 1.5
  • noFrames, default: true

Usage note for Windows Users

For those who need this package in windows, there is a way. First download poppler-utils for windows here http://blog.alivate.com.au/poppler-windows/. And download the latest binary., (*8)

After download it, extract it. There will be a directory called bin. We will need this one. Then change your code like this, (*9)

html();

// check if your pdf has more than one pages
$total_pages = $pdf->getPages();

// Your pdf happen to have more than one pages and you want to go another page? Got it. use this command to change the current page to page 3
$html->goToPage(3);

// and then you can do as you please with that dom, you can find any element you want
$paragraphs = $html->find('body > p');

?>

Usage note for OS/X Users

Thanks to @kaleidoscopique for giving a try and make it run on OS/X for this package, (*10)

1. Install brew, (*11)

Brew is a famous package manager on OS/X : http://brew.sh/ (aptitude style)., (*12)

2. Install poppler, (*13)

brew install poppler

3. Verify the path of pdfinfo and pdftohtml, (*14)

$ which pdfinfo
/usr/local/bin/pdfinfo

$ which pdftohtml
/usr/local/bin/pdfinfo

4. Whatever the paths are, use Gufy\PdfToHtml\Config::set to set them in your php code. Obviously, use the same path as the one given by the which command;, (*15)

html();
?>

Feedback & Contribute

Send me an issue for improvement or any buggy thing. I love to help and solve another people problems. Thanks :+1:, (*16)

The Versions

06/03 2017

dev-master

9999999-dev

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

The Development Requires

by Mochamad Gufron

11/10 2016

v2.0.8

2.0.8.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

The Development Requires

by Mochamad Gufron

31/08 2016

v2.0.7

2.0.7.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

The Development Requires

by Mochamad Gufron

27/04 2016

v2.0.6

2.0.6.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

The Development Requires

by Mochamad Gufron

13/02 2016

v2.0.5

2.0.5.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

The Development Requires

by Mochamad Gufron

11/08 2015

v2.0.4

2.0.4.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

by Mochamad Gufron

04/08 2015

v2.0.3

2.0.3.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

by Mochamad Gufron

24/07 2015

v2.0.2

2.0.2.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

by Mochamad Gufron

23/07 2015

v2.0.1

2.0.1.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

by Mochamad Gufron

23/07 2015

v2.0.0

2.0.0.0

PDF to HTML converter with PHP using Poppler-utils

  Sources   Download

MIT

The Requires

 

by Mochamad Gufron