2017 © Pedro Peláez
 

library tesseract_ocr

A wrapper to work with Tesseract OCR inside PHP.

image

thiagoalessio/tesseract_ocr

A wrapper to work with Tesseract OCR inside PHP.

  • Tuesday, July 24, 2018
  • by thiagoalessio
  • Repository
  • 100 Watchers
  • 1311 Stars
  • 344,248 Installations
  • PHP
  • 5 Dependents
  • 1 Suggesters
  • 348 Forks
  • 1 Open issues
  • 38 Versions
  • 7 % Grown

The README.md

Tesseract OCR for PHP

A wrapper to work with Tesseract OCR inside PHP., (*1)

[CI][ci] AppVeyor Codacy ![Test Coverage][test_coverage_badge] br/ Latest Stable Version Total Downloads Monthly Downloads, (*2)

Installation

Via Composer:, (*3)

$ composer require thiagoalessio/tesseract_ocr

:bangbang: This library depends on Tesseract OCR, version 3.02 or later., (*4)

br/, (*5)

Note for Windows users

There are many ways to install Tesseract OCR on your system, but if you just want something quick to get up and running, I recommend installing the Capture2Text package with Chocolatey., (*6)

choco install capture2text --version 3.9

:warning: Recent versions of Capture2Text stopped shipping the tesseract binary., (*7)

br/, (*8)

Note for macOS users

With MacPorts you can install support for individual languages, like so:, (*9)

$ sudo port install tesseract-<langcode>

But that is not possible with Homebrew. It comes only with English support by default, so if you intend to use it for other language, the quickest solution is to install them all:, (*10)

$ brew install tesseract tesseract-lang

br/, (*11)

Usage

Basic usage

, (*12)

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('text.png'))
    ->run();
The quick brown fox
jumps over
the lazy dog.

br/, (*13)

Other languages

, (*14)

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('german.png'))
    ->lang('deu')
    ->run();
Bülowstraße

br/, (*15)

Multiple languages

, (*16)

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('mixed-languages.png'))
    ->lang('eng', 'jpn', 'spa')
    ->run();
I eat すし y Pollo

br/, (*17)

Inducing recognition

, (*18)

use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('8055.png'))
    ->allowlist(range('A', 'Z'))
    ->run();
BOSS

br/, (*19)

Breaking CAPTCHAs

Yes, I know some of you might want to use this library for the noble purpose of breaking CAPTCHAs, so please take a look at this comment:, (*20)

https://github.com/thiagoalessio/tesseract-ocr-for-php/issues/91#issuecomment-342290510, (*21)

API

run

Executes a tesseract command, optionally receiving an integer as timeout, in case you experience stalled tesseract processes., (*22)

$ocr = new TesseractOCR();
$ocr->run();

```php $ocr = new TesseractOCR(); $timeout = 500; $ocr->run($timeout);, (*23)


### image Define the path of an image to be recognized by `tesseract`. ```php $ocr = new TesseractOCR(); $ocr->image('/path/to/image.png'); $ocr->run();

imageData

Set the image to be recognized by tesseract from a string, with its size. This can be useful when dealing with files that are already loaded in memory. You can easily retrieve the image data and size of an image object :, (*24)

//Using Imagick
$data = $img->getImageBlob();
$size = $img->getImageLength();
//Using GD
ob_start();
// Note that you can use any format supported by tesseract
imagepng($img, null, 0);
$size = ob_get_length();
$data = ob_get_clean();

$ocr = new TesseractOCR();
$ocr->imageData($data, $size);
$ocr->run();

executable

Define a custom location of the tesseract executable, if by any reason it is not present in the $PATH., (*25)

echo (new TesseractOCR('img.png'))
    ->executable('/path/to/tesseract')
    ->run();

version

Returns the current version of tesseract., (*26)

echo (new TesseractOCR())->version();

availableLanguages

Returns a list of available languages/scripts., (*27)

foreach((new TesseractOCR())->availableLanguages() as $lang) echo $lang;

More info: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages-and-scripts, (*28)

tessdataDir

Specify a custom location for the tessdata directory., (*29)

echo (new TesseractOCR('img.png'))
    ->tessdataDir('/path')
    ->run();

userWords

Specify the location of user words file., (*30)

This is a plain text file containing a list of words that you want to be considered as a normal dictionary words by tesseract., (*31)

Useful when dealing with contents that contain technical terminology, jargon, etc., (*32)

$ cat /path/to/user-words.txt
foo
bar
echo (new TesseractOCR('img.png'))
    ->userWords('/path/to/user-words.txt')
    ->run();

userPatterns

Specify the location of user patterns file., (*33)

If the contents you are dealing with have known patterns, this option can help a lot tesseract's recognition accuracy., (*34)

$ cat /path/to/user-patterns.txt'
1-\d\d\d-GOOG-441
www.\n\\\*.com
echo (new TesseractOCR('img.png'))
    ->userPatterns('/path/to/user-patterns.txt')
    ->run();

lang

Define one or more languages to be used during the recognition. A complete list of available languages can be found at: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages, (*35)

Tip from @daijiale: Use the combination ->lang('chi_sim', 'chi_tra') for proper recognition of Chinese., (*36)

 echo (new TesseractOCR('img.png'))
     ->lang('lang1', 'lang2', 'lang3')
     ->run();

psm

Specify the Page Segmentation Method, which instructs tesseract how to interpret the given image., (*37)

More info: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method, (*38)

echo (new TesseractOCR('img.png'))
    ->psm(6)
    ->run();

oem

Specify the OCR Engine Mode. (see tesseract --help-oem), (*39)

echo (new TesseractOCR('img.png'))
    ->oem(2)
    ->run();

dpi

Specify the image DPI. It is useful if your image does not contain this information in its metadata., (*40)

echo (new TesseractOCR('img.png'))
    ->dpi(300)
    ->run();

allowlist

This is a shortcut for ->config('tessedit_char_whitelist', 'abcdef....')., (*41)

echo (new TesseractOCR('img.png'))
    ->allowlist(range('a', 'z'), range(0, 9), '-_@')
    ->run();

configFile

Specify a config file to be used. It can either be the path to your own config file or the name of one of the predefined config files: https://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs, (*42)

echo (new TesseractOCR('img.png'))
    ->configFile('hocr')
    ->run();

setOutputFile

Specify an Outputfile to be used. Be aware: If you set an outputfile then the option withoutTempFiles is ignored. Tempfiles are written (and deleted) even if withoutTempFiles = true., (*43)

In combination with configFile you are able to get the hocr, tsv or pdf files., (*44)

echo (new TesseractOCR('img.png'))
    ->configFile('pdf')
    ->setOutputFile('/PATH_TO_MY_OUTPUTFILE/searchable.pdf')
    ->run();

digits

Shortcut for ->configFile('digits')., (*45)

echo (new TesseractOCR('img.png'))
    ->digits()
    ->run();

hocr

Shortcut for ->configFile('hocr')., (*46)

echo (new TesseractOCR('img.png'))
    ->hocr()
    ->run();

pdf

Shortcut for ->configFile('pdf')., (*47)

echo (new TesseractOCR('img.png'))
    ->pdf()
    ->run();

quiet

Shortcut for ->configFile('quiet')., (*48)

echo (new TesseractOCR('img.png'))
    ->quiet()
    ->run();

tsv

Shortcut for ->configFile('tsv')., (*49)

echo (new TesseractOCR('img.png'))
    ->tsv()
    ->run();

txt

Shortcut for ->configFile('txt')., (*50)

echo (new TesseractOCR('img.png'))
    ->txt()
    ->run();

tempDir

Define a custom directory to store temporary files generated by tesseract. Make sure the directory actually exists and the user running php is allowed to write in there., (*51)

echo (new TesseractOCR('img.png'))
    ->tempDir('./my/custom/temp/dir')
    ->run();

withoutTempFiles

Specify that tesseract should output the recognized text without writing to temporary files. The data is gathered from the standard output of tesseract instead., (*52)

echo (new TesseractOCR('img.png'))
    ->withoutTempFiles()
    ->run();

Other options

Any configuration option offered by Tesseract can be used like that:, (*53)

echo (new TesseractOCR('img.png'))
    ->config('config_var', 'value')
    ->config('other_config_var', 'other value')
    ->run();

Or like that:, (*54)

echo (new TesseractOCR('img.png'))
    ->configVar('value')
    ->otherConfigVar('other value')
    ->run();

More info: https://github.com/tesseract-ocr/tesseract/wiki/ControlParams, (*55)

Thread-limit

Sometimes, it may be useful to limit the number of threads that tesseract is allowed to use (e.g. in this case). Set the maxmium number of threads as param for the run function:, (*56)

echo (new TesseractOCR('img.png'))
    ->threadLimit(1)
    ->run();

How to contribute

You can contribute to this project by:, (*57)

  • Opening an Issue if you found a bug or wish to propose a new feature;
  • Placing a Pull Request with code that fix a bug, missing/wrong documentation or implement a new feature;

Just make sure you take a look at our Code of Conduct and Contributing instructions., (*58)

License

tesseract-ocr-for-php is released under the MIT License., (*59)

Made with love in Berlin, (*60)

The Versions

24/07 2018

dev-master

9999999-dev

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT Apache-2.0

The Requires

  • php ^5.4 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

24/07 2018

2.6.3

2.6.3.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.4 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

24/07 2018

dev-issue-138

dev-issue-138

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.4 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

24/07 2018

dev-issue-139

dev-issue-139

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.4 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

07/07 2018

2.6.2

2.6.2.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.4 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

05/07 2018

2.6.1

2.6.1.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.4 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

04/07 2018

2.6.0

2.6.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

01/07 2018

2.5.0

2.5.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

11/05 2018

2.4.0

2.4.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

11/05 2018

dev-pr-123

dev-pr-123

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

07/05 2018

2.3.1

2.3.1.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

06/05 2018

2.3.0

2.3.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

27/02 2018

2.2.2

2.2.2.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

24/02 2018

2.2.1

2.2.1.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

24/02 2018

2.2.0

2.2.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

20/01 2018

2.1.1

2.1.1.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

20/01 2018

dev-refactoring

dev-refactoring

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

19/01 2018

dev-support-more-php-versions

dev-support-more-php-versions

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

19/01 2018

dev-switch-to-circle-ci

dev-switch-to-circle-ci

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

17/01 2018

dev-fix-windows-build

dev-fix-windows-build

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

16/01 2018

dev-remove-phpunit

dev-remove-phpunit

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

15/12 2017

dev-experiments

dev-experiments

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

14/12 2017

2.1.0

2.1.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

MIT

The Requires

  • php ^5.6 || ^7.0

 

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

10/12 2017

2.0.0

2.0.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

by Avatar thiagoalessio

ocr text recognition tesseract

06/12 2017

1.3.0

1.3.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

06/12 2017

1.2.3

1.2.3.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

30/11 2017

1.2.2

1.2.2.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

30/11 2017

1.2.1

1.2.1.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

05/11 2017

1.2.0

1.2.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

03/06 2017

1.1.0

1.1.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

14/05 2017

1.0.0

1.0.0.0

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

21/04 2016

1.0.0-RC

1.0.0.0-RC

A wrapper to work with Tesseract OCR inside PHP.

  Sources   Download

Apache-2.0

The Development Requires

ocr text recognition tesseract

02/04 2015

0.2.1

0.2.1.0

A wrapper to work with TesseractOCR inside PHP

  Sources   Download

MIT

The Development Requires

ocr text recognition tesseract

20/03 2014

0.2.0

0.2.0.0

A wrapper to work with TesseractOCR inside PHP

  Sources   Download

MIT

The Development Requires

ocr text recognition tesseract

19/03 2014

0.1.5

0.1.5.0

A wrapper to work with TesseractOCR inside your PHP scripts

  Sources   Download

MIT

The Development Requires

ocr text recognition tesseract

28/02 2014

0.1.4

0.1.4.0

A wrapper to work with TesseractOCR inside your PHP scripts

  Sources   Download

GPL

The Development Requires

ocr text recognition tesseract

28/02 2014

0.1.3

0.1.3.0

A wrapper to work with TesseractOCR inside your PHP scripts

  Sources   Download

GPL

The Development Requires

ocr text recognition tesseract

31/08 2013

0.1.2

0.1.2.0

A wrapper to work with TesseractOCR inside your PHP scripts

  Sources   Download

GPL

The Development Requires

ocr text recognition tesseract