, (*1)
SerpPageSerializer
Serialize/deserialize Search Engine Result Pages to JSON and XML (JMS/Serializer wrapper)., (*2)
Installing via Composer (recommended)
Install composer in your project:, (*3)
curl -s http://getcomposer.org/installer | php
Create a composer.json file in your project root:, (*4)
{
"require": {
"franzip/serp-page-serializer": "1.0.*"
}
}
Install via composer, (*5)
php composer.phar install
Constructor
$serpSerializer = new SerpPageSerializer($cacheDir = "serializer_cache");
Data type constraints
Serialization
The SerpPageSerializer->serialize() method accepts only a SerializableSerpPage
object and returns a SerializedSerpPage object.
The serialized content is available through the SerializedSerpPage->getContent()
method.
Before using the serializer, normalize your data as follows:, (*6)
use Franzip\SerpPageSerializer\Models\SerializableSerpPage;
// assuming you have extracted the data someway
$serializableSerpPage = new SerializableSerpPage($engine, $keyword, $pageUrl,
$pageNumber, $age, $entries);
Where:, (*7)
-
$engine - string
- Represents the Search Engine vendor (i.e. Google, Bing, etc).
-
$keyword - string
- Represents the keyword associated to the Search Engine page
-
$pageUrl - string
- Represents the url of the Search Engine for the given keyword/pageNumber
-
$pageNumber - integer
- Represents the page number for the given Search Engine keyword search
-
$age - DateTime object
- Represents when the data were extracted
-
$entries - array
- Represents the core data (see below)
Every Search Engine result page entry has a tripartite structure:, (*8)
- A title, usually highlighted in blue.
- A url.
- A textual snippet.
, (*9)
The $entries array structure must resemble the above mentioned schema, where
the sequential array index stands for the entry position in the page:, (*10)
array(
array('url' => 'someurl', 'snippet' => 'somesnippet', 'title' => 'sometitle'),
array('url' => 'someurl', 'snippet' => 'somesnippet', 'title' => 'sometitle'),
array('url' => 'someurl', 'snippet' => 'somesnippet', 'title' => 'sometitle'),
...
);
Deserialization
The SerpPageSerializer->deserialize() only accepts a SerializedSerpPage
as argument, yielding back a SerpPageJSON or a SerpPageXML object., (*11)
Usage (serialize data)
use Franzip\SerpPageSerializer\SerpPageSerializer;
use Franzip\SerpPageSerializer\Models\SerializableSerpPage;
$engine = 'google';
$keyword = 'foobar';
$pageUrl = 'https://www.google.com/search?q=foobar';
$pageNumber = 1;
$age = new \DateTime();
$age->setTimeStamp(time());
$entries = array(array('url' => 'www.foobar2000.org',
'title' => 'foobar2000',
'snippet' => 'blabla'),
array(...),
...);
$serpSerializer = new SerpPageSerializer();
$pageToSerialize = new SerializableSerpPage($engine, $keyword, $pageUrl,
$pageNumber, $age, $entries);
$serializedXMLData = $serpSerializer->serialize($pageToSerialize->getContent(), 'xml');
var_dump($serializedXMLData);
/*
*
* <serp_page engine="google" page_number="1" page_url="https://www.google.com/search?q=foobar" keyword="foobar" age="2015-03-19">
* <entry position="1">
* <url>www.foobar2000.org</url>
* <title>foobar2000</title>
* <snippet>blabla</snippet>
* </entry>
* <entry position="2">
* ...
* </entry>
* </serp_page>
*/
$serializedJSONData = $serpSerializer->serialize($pageToSerialize->getContent(), 'json');
var_dump($serializedJSONData);
/*
* {
* "engine": "google",
* "page_number": 1,
* "page_url": "https:\/\/www.google.com\/search?q=foobar",
* "keyword":"foobar",
* "age":"2015-03-19",
* "entries":[
* {
* "position": 1,
* "url": "www.foobar2000.org",
* "title": "foobar2000",
* "snippet": "blabla"
* },
* {
* "position": 2,
* ...
* },
* ...
* ]
* }
*/
Usage (deserialize data)
use Franzip\SerpPageSerializer\SerpPageSerializer;
$serpSerializer = new SerpPageSerializer();
$serpPageXML = $serpSerializer->deserialize($serializedXMLPage, 'xml');
var_dump($serializedXMLPage);
// object(Franzip\SerpPageSerializer\Models\SerializedSerpPage) (1) {
// ...
var_dump($serpPageXML);
// object(Franzip\SerpPageSerializer\Models\SerpPageXML) (6) {
// ...
$serpPageJSON = $serpSerializer->deserialize($serializedJSONPage, 'json');
var_dump($serializedJSONPage);
// object(Franzip\SerpPageSerializer\Models\SerializedSerpPage) (1) {
// ...
var_dump($serpPageJSON);
// object(Franzip\SerpPageSerializer\Models\SerpPageJSON) (6) {
// ...
TODOs
- [x] Add a default $cacheDir to constructor.
- [x] A decent exceptions system.
- [x] Allow typechecking on deserialization by wrapping serialized strings in
a dedicated class.
- [x] Fix serialization tests.
- [x] Fix deserialization tests.
- [x] Rewrite docs.
- [ ] CSV serialization/deserialization support.
- [ ] Fix messy tests.
License
MIT Public License., (*12)