2017 © Pedro Peláez
 

library serp-page-serializer

Serialize/deserialize Search Engine Result Pages to JSON, XML and YAML (JMS/Serializer wrapper).

image

franzip/serp-page-serializer

Serialize/deserialize Search Engine Result Pages to JSON, XML and YAML (JMS/Serializer wrapper).

  • Wednesday, October 14, 2015
  • by franzip
  • Repository
  • 1 Watchers
  • 0 Stars
  • 111 Installations
  • PHP
  • 1 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 1 Versions
  • 3 % Grown

The README.md

Build Status Coverage Status, (*1)

SerpPageSerializer

Serialize/deserialize Search Engine Result Pages to JSON and XML (JMS/Serializer wrapper)., (*2)

Install composer in your project:, (*3)

curl -s http://getcomposer.org/installer | php

Create a composer.json file in your project root:, (*4)

{
    "require": {
        "franzip/serp-page-serializer": "1.0.*"
    }
}

Install via composer, (*5)

php composer.phar install

Constructor

$serpSerializer = new SerpPageSerializer($cacheDir = "serializer_cache");

Data type constraints

Serialization

The SerpPageSerializer->serialize() method accepts only a SerializableSerpPage object and returns a SerializedSerpPage object. The serialized content is available through the SerializedSerpPage->getContent() method. Before using the serializer, normalize your data as follows:, (*6)


use Franzip\SerpPageSerializer\Models\SerializableSerpPage; // assuming you have extracted the data someway $serializableSerpPage = new SerializableSerpPage($engine, $keyword, $pageUrl, $pageNumber, $age, $entries);

Where:, (*7)

  1. $engine - string
    • Represents the Search Engine vendor (i.e. Google, Bing, etc).
  2. $keyword - string
    • Represents the keyword associated to the Search Engine page
  3. $pageUrl - string
    • Represents the url of the Search Engine for the given keyword/pageNumber
  4. $pageNumber - integer
    • Represents the page number for the given Search Engine keyword search
  5. $age - DateTime object
    • Represents when the data were extracted
  6. $entries - array
    • Represents the core data (see below)

Every Search Engine result page entry has a tripartite structure:, (*8)

  1. A title, usually highlighted in blue.
  2. A url.
  3. A textual snippet.

Typical SERP entry structure, (*9)

The $entries array structure must resemble the above mentioned schema, where the sequential array index stands for the entry position in the page:, (*10)

array(
      array('url' => 'someurl', 'snippet' => 'somesnippet', 'title' => 'sometitle'),
      array('url' => 'someurl', 'snippet' => 'somesnippet', 'title' => 'sometitle'),
      array('url' => 'someurl', 'snippet' => 'somesnippet', 'title' => 'sometitle'),
      ...
     );

Deserialization

The SerpPageSerializer->deserialize() only accepts a SerializedSerpPage as argument, yielding back a SerpPageJSON or a SerpPageXML object., (*11)

Usage (serialize data)


use Franzip\SerpPageSerializer\SerpPageSerializer; use Franzip\SerpPageSerializer\Models\SerializableSerpPage; $engine = 'google'; $keyword = 'foobar'; $pageUrl = 'https://www.google.com/search?q=foobar'; $pageNumber = 1; $age = new \DateTime(); $age->setTimeStamp(time()); $entries = array(array('url' => 'www.foobar2000.org', 'title' => 'foobar2000', 'snippet' => 'blabla'), array(...), ...); $serpSerializer = new SerpPageSerializer(); $pageToSerialize = new SerializableSerpPage($engine, $keyword, $pageUrl, $pageNumber, $age, $entries); $serializedXMLData = $serpSerializer->serialize($pageToSerialize->getContent(), 'xml'); var_dump($serializedXMLData); /* * * <serp_page engine="google" page_number="1" page_url="https://www.google.com/search?q=foobar" keyword="foobar" age="2015-03-19"> * <entry position="1"> * <url>www.foobar2000.org</url> * <title>foobar2000</title> * <snippet>blabla</snippet> * </entry> * <entry position="2"> * ... * </entry> * </serp_page> */ $serializedJSONData = $serpSerializer->serialize($pageToSerialize->getContent(), 'json'); var_dump($serializedJSONData); /* * { * "engine": "google", * "page_number": 1, * "page_url": "https:\/\/www.google.com\/search?q=foobar", * "keyword":"foobar", * "age":"2015-03-19", * "entries":[ * { * "position": 1, * "url": "www.foobar2000.org", * "title": "foobar2000", * "snippet": "blabla" * }, * { * "position": 2, * ... * }, * ... * ] * } */

Usage (deserialize data)


use Franzip\SerpPageSerializer\SerpPageSerializer; $serpSerializer = new SerpPageSerializer(); $serpPageXML = $serpSerializer->deserialize($serializedXMLPage, 'xml'); var_dump($serializedXMLPage); // object(Franzip\SerpPageSerializer\Models\SerializedSerpPage) (1) { // ... var_dump($serpPageXML); // object(Franzip\SerpPageSerializer\Models\SerpPageXML) (6) { // ... $serpPageJSON = $serpSerializer->deserialize($serializedJSONPage, 'json'); var_dump($serializedJSONPage); // object(Franzip\SerpPageSerializer\Models\SerializedSerpPage) (1) { // ... var_dump($serpPageJSON); // object(Franzip\SerpPageSerializer\Models\SerpPageJSON) (6) { // ...

TODOs

  • [x] Add a default $cacheDir to constructor.
  • [x] A decent exceptions system.
  • [x] Allow typechecking on deserialization by wrapping serialized strings in a dedicated class.
  • [x] Fix serialization tests.
  • [x] Fix deserialization tests.
  • [x] Rewrite docs.
  • [ ] CSV serialization/deserialization support.
  • [ ] Fix messy tests.

License

MIT Public License., (*12)

The Versions

14/10 2015

dev-master

9999999-dev http://github.com/franzip/serp-page-serializer

Serialize/deserialize Search Engine Result Pages to JSON, XML and YAML (JMS/Serializer wrapper).

  Sources   Download

MIT

The Requires

 

The Development Requires

by Francesco Pezzella

json xml serializer page yaml search-engine serp