, (*1)
Full Text Search for PHP on Google App Engine
This library provides native PHP access to the Google App Engine Search API., (*2)
At the time of writing there is no off-the-shelf way to access the Google App Engine full text search API from the PHP runtime., (*3)
Generally this means developers cannot access the service without using Python/Java/Go proxy modules - which adds complexity, another language, additional potential points of failure and performance impact., (*4)
ALPHA This library is in the very early stages of development. Do not use it in production. It will change., (*5)
Table of Contents
Examples
I find examples a great way to decide if I want to even try out a library, so here's a couple for you., (*6)
// Schema describing a book
$obj_schema = (new \Search\Schema())
->addText('title')
->addText('author')
->addAtom('isbn')
->addNumber('price');
// Create and populate a document
$obj_book = $obj_schema->createDocument([
'title' => 'The Merchant of Venice',
'author' => 'William Shakespeare',
'isbn' => '1840224312',
'price' => 11.99
]);
// Write it to the Index
$obj_index = new \Search\Index('library');
$obj_index->put($obj_book);
In this example, I've used the Alternative Array Syntax for creating Documents - but you can also do it like this:, (*7)
$obj_book = $obj_schema->createDocument();
$obj_book->title = 'Romeo and Juliet';
$obj_book->author = 'William Shakespeare';
$obj_book->isbn = '1840224339';
$obj_book->price = 9.99;
Now let's do a simple search and display the output, (*8)
$obj_index = new \Search\Index('library');
$obj_response = $obj_index->search('romeo');
foreach($obj_response->results as $obj_result) {
echo "Title: {$obj_result->doc->title}, ISBN: {$obj_result->doc->isbn} <br />", PHP_EOL;
}
Demo Application
Search pubs!, (*9)
Application: http://pub-search.appspot.com/, (*10)
Code: https://github.com/tomwalder/pub-search, (*11)
Getting Started
Install with Composer
To install using Composer, use this require line in your composer.json
for bleeding-edge features, dev-master, (*12)
"tomwalder/php-appengine-search": "v0.0.4-alpha"
, (*13)
Or, if you're using the command line:, (*14)
composer require tomwalder/php-appengine-search
, (*15)
You may need minimum-stability: dev
, (*16)
Queries
You can supply a simple query string to Index::search
, (*17)
$obj_index->search('romeo');
For more control and options, you can supply a Query
object, (*18)
$obj_query = (new \Search\Query($str_query))
->fields(['isbn', 'price'])
->limit(10)
->sort('price');
$obj_response = $obj_index->search($obj_query);
Query Strings
Some simple, valid query strings:
- price:2.99
- romeo
- dob:2015-01-01
- dob < 2000-01-01
- tom AND age:36
, (*19)
For much more information, see the Python reference docs: https://cloud.google.com/appengine/docs/python/search/query_strings, (*20)
Sorting
$obj_query->sort('price');
$obj_query->sort('price', Query::ASC);
Limits & Offsets
$obj_query->limit(10);
$obj_query->offset(5);
Return Fields
$obj_query->fields(['isbn', 'price']);
Expressions
The library supports requesting arbitrary expressions in the results., (*21)
$obj_query->expression('euros', 'gbp * 1.45']);
These can be accessed from the Document::getExpression()
method on the resulting documents, like this:, (*22)
$obj_doc->getExpression('euros');
Get Document by ID
You can fetch a single document from an index directly, by it's unique Doc ID:, (*23)
$obj_index->get('some-document-id-here');
Scoring
You can enable the MatchScorer by calling the Query::score
method., (*24)
If you do this, each document in the result set will be scored by the Search API "according to search term frequency" - Google., (*25)
Without it, documents will all have a score of 0., (*26)
$obj_query->score();
And the results..., (*27)
foreach($obj_response->results as $obj_result) {
echo $obj_result->score, '<br />'; // Score will be a float
}
Multiple Sorts and Scoring
If you apply score()
and sort()
you may be wasting cycles and costing money. Only score documents when you intend to sort by score., (*28)
If you need to mix sorting of score and another field, you can use the magic field name _score
like this - here we sort by price then score, so records with the same price are sorted by their score., (*29)
$obj_query->score()->sort('price')->sort('_score');
Distance From
A common use case is searching for documents that have a Geopoint field, based on their distance from a known Geopoint. e.g. "Find pubs near me", (*30)
There is a helper method to do this for you, and it also returns the distance in meters in the response., (*31)
$obj_query->sortByDistance('location', [53.4653381,-2.2483717]);
This will return results, nearest first to the supplied Lat/Lon, and there will be an expression returned for the distance itself - prefixed with distance_from_
:, (*32)
$obj_result->doc->getExpression('distance_from_location');
Autocomplete
Autocomplete is one of the most desired and useful features of a search solution., (*33)
This can be implemented fairly easily with the Google App Engine Search API, with a little slight of hand!, (*34)
The Search API does not natively support "edge n-gram" tokenisation (which is what we need for autocomplete!)., (*35)
So, you can do this with the library - when creating documents, set a second text field with the output from the included Tokenizer::edgeNGram
function, (*36)
$obj_tkzr = new \Search\Tokenizer();
$obj_schema->createDocument([
'name' => $str_name,
'name_ngram' => $obj_tkzr->edgeNGram($str_name),
]);
Then you can run autocomplete queries easily like this:, (*37)
$obj_response = $obj_index->search((new \Search\Query('name_ngram:' . $str_query)));
You can see a full demo application using this in my "pub search" demo app, (*38)
Creating Documents
Schemas & Field Types
As per the Python docs, the available field types are, (*39)
-
Atom - an indivisible character string
-
Text - a plain text string that can be searched word by word
-
HTML - a string that contains HTML markup tags, only the text outside the markup tags can be searched
-
Number - a floating point number
-
Date - a date with year/month/day and optional time
-
Geopoint - latitude and longitude coordinates
Dates
We support DateTime
objects or date strings in the format YYYY-MM-DD
(PHP date('Y-m-d')
), (*40)
$obj_person_schema = (new \Search\Schema())
->addText('name')
->addDate('dob');
$obj_person = $obj_person_schema->createDocument([
'name' => 'Marty McFly',
'dob' => new DateTime()
]);
Geopoints - Location Data
Create an entry with a Geopoint field, (*41)
$obj_pub_schema = (new \Search\Schema())
->addText('name')
->addGeopoint('where')
->addNumber('rating');
$obj_pub = $obj_pub_schema->createDocument([
'name' => 'Kim by the Sea',
'where' => [53.4653381, -2.2483717],
'rating' => 3
]);
Batch Inserts
It's more efficient to insert in batches if you have multiple documents. Up to 200 documents can be inserted at once., (*42)
Just pass an array of Document objects into the Index::put()
method, like this:, (*43)
$obj_index = new \Search\Index('library');
$obj_index->put([$obj_book1, $obj_book2, $obj_book3]);
Alternative Array Syntax
There is an alternative to directly constructing a new Search\Document
and setting it's member data, which is to use the Search\Schema::createDocument
factory method as follows., (*44)
$obj_book = $obj_schema->createDocument([
'title' => 'The Merchant of Venice',
'author' => 'William Shakespeare',
'isbn' => '1840224312',
'price' => 11.99
]);
Namespaces
You can set a namespace when constructing an index. This will allow you to support multi-tenant applications., (*45)
$obj_index = new \Search\Index('library', 'client1');
Facets
The Search API supports 2 types of document facets for categorisation, ATOM and NUMBER., (*46)
ATOM are probably the ones you are most familiar with, and result sets will include counts per unique facet, kind of like this:, (*47)
For shirt sizes
* small (9)
* medium (37), (*48)
Adding Facets to a Document
$obj_doc->atomFacet('size', 'small');
$obj_doc->atomFacet('colour', 'blue');
Getting Facets in Results
$obj_query->facets();
Deleting Documents
You can delete documents by calling the Index::delete()
method., (*49)
It support one or more Document
objects - or one or more Document ID strings - or a mixture of objects and ID strings!, (*50)
$obj_index = new \Search\Index('library');
$obj_index->delete('some-document-id');
$obj_index->delete([$obj_doc1, $obj_doc2]);
$obj_index->delete([$obj_doc3, 'another-document-id']);
Local Development Environment
The Search API is supported locally, because it's included to support the Python, Java and Go App Engine runtimes., (*51)
Best Practice, Free Quotas, Costs
Like most App Engine services, search is free... up to a point!, (*52)
And some best practice that is most certainly worth a read, (*53)
Google Software
I've had to include 2 files from Google to make this work - they are the Protocol Buffer implementations for the Search API. You will find them in the /libs
folder., (*54)
They are also available directly from the following repository: https://github.com/GoogleCloudPlatform/appengine-php-sdk, (*55)
These 2 files are Copyright 2007 Google Inc., (*56)
As and when they make it into the actual live PHP runtime, I will remove them from here., (*57)
Thank you to @sjlangley for the assist., (*58)
Other App Engine Software
If you've enjoyed this, you might be interested in my Google Cloud Datastore Library for PHP, PHP-GDS, (*59)