Tika REST Client
This PHP client interacts with the Tika REST Server for extracting content
and metadata from a wide variety of document file types. There are [alternative
PHP libraries][alternatives] that use the Tika command line client, but
instantiating the JVM for each operation is slow and costly., (*1)
This client is built on Guzzle., (*2)
Project Setup
This project is installed with composer., (*3)
In the shell, you can run this command:, (*4)
composer require bangpound/tika-rest-client
Or you can edit your composer.json
file to include this requirement:, (*5)
{
"require": {
"bangpound/tika-rest-client": "^1.0"
}
}
Usage
<?php
$client = new Bangpound\Tika\Client('http://localhost:9998');
$response = $client->tika(array(
'file' => 'TestPDF.pdf',
));
// Metadata varies by file and file type, so refer to the Apache Tika docs for details.
$all_metadata = $response->metadata;
// If you know the metadata element you want to retrieve, specify it as the argument
// to the response's metadata method.
$author = $response->metadata('author');
// Extracted content can be retrieved as a SimpleXMLElement or a string of XML.
$content_xml = $response->getBody();
$page_2 = $content_xml->children()->div[1];
$content_text = $response->getBody(true);
Testing
The Tika REST Client has an incomplete suite of tests. Run them using phpunit after
installing the dev dependencies., (*6)
composer install
phpunit
License
This code is released under the MIT license., (*7)