dev-master
9999999-devA wrapper to work with TesseractOCR inside your PHP scripts
MIT
The Requires
The Development Requires
by Kirill Popov
ocr text recognition tesseract image recognition
A wrapper to work with TesseractOCR inside your PHP scripts
A wrapper to work with TesseractOCR inside your PHP scripts. Based on https://github.com/wangoviridans/tesseract-ocr-for-php, (*1)
Via composer (https://packagist.org/packages/wangoviridans/tesseract_ocr), (*2)
{ "require": { "wangoviridans/tesseract_ocr": ">= 0.0.1" } }
Or just clone and put somewhere inside your project folder., (*3)
$ cd myapp/vendor $ git clone git://github.com/wangoviridans/tesseract-ocr-for-php.git
IMPORTANT: Make sure that the tesseract
binary is on your $PATH.
If you're running PHP on a webserver, the user may be not you, but _www or
similar.
If you need, there is always the possibility of modify your $PATH:, (*4)
$path = getenv('PATH'); putenv("PATH=$path:/usr/local/bin");
<?php require_once '/path/to/src/TesseractOCR.php'; //or require_once 'vendor/autoload.php' if you are using composer $tesseract = new TesseractOCR(); $tesseract->setImage('images/some-words.jpg'); echo $tesseract->recognize(); or <?php require_once '/path/to/src/TesseractOCR.php'; //or require_once 'vendor/autoload.php' if you are using composer $tesseract = new TesseractOCR(array( 'file.input' => 'images/some-words.jpg' )); echo $tesseract->recognize();
Tesseract has training data for several languages, which certainly improve the accuracy of the recognition., (*5)
<?php require_once '/path/to/src/TesseractOCR.php'; //or require_once 'vendor/autoload.php' if you are using composer $tesseract = new TesseractOCR('images/sind-sie-deutsch.jpg'); $tesseract->setLanguage('deu'); //same 3-letters code as tesseract training data packages echo $tesseract->recognize(); or <?php require_once '/path/to/src/TesseractOCR.php'; //or require_once 'vendor/autoload.php' if you are using composer $tesseract = new TesseractOCR(array( 'file.input' => 'images/sind-sie-deutsch.jpg', 'language' => 'deu' //same 3-letters code as tesseract training data packages )); echo $tesseract->recognize();
Sometimes tesseract misunderstand some chars, such as:, (*6)
0 - O 1 - l j - , etc ...
But you can improve recognition accuracy by specifying what kind of chars you're sending, for example:, (*7)
<?php $tesseract = new TesseractOCR(); $tesseract->setImage('my-image.jpg') ->setWhitelist(range('a','z')); //tesseract will threat everything as downcase letters echo $tesseract->recognize(); $tesseract = new TesseractOCR(); $tesseract->setImage('my-image.jpg') ->setWhitelist(range('A','Z'), range(0,9), '_-@.'); //you can pass as many ranges as you need echo $tesseract->recognize();
You can even do cool stuff like this one:, (*8)
<?php $tesseract = new TesseractOCR(); $tesseract->setImage('my-image.jpg') ->setWhitelist(range('A','Z')); echo $tesseract->recognize(); //will return "GIT"
Permission denied
or No such file or directory
To solve this issue you can specify a custom directory for temp files:, (*9)
<?php $tesseract = new TesseractOCR(); $tesseract->setImage('my-image.jpg') ->setTempDir('./my-temp-dir'); or <?php $tesseract = new TesseractOCR(array( 'file.input' => 'my-image.jpg', 'tempDir' => './my-temp-dir' ));
A wrapper to work with TesseractOCR inside your PHP scripts
MIT
ocr text recognition tesseract image recognition