2017 © Pedro Peláez
 

library serp-fetcher

Wrapper around SimpleHtmlDom to easily fetch data from Search Engine Result Pages with built-in caching support.

image

franzip/serp-fetcher

Wrapper around SimpleHtmlDom to easily fetch data from Search Engine Result Pages with built-in caching support.

  • Wednesday, October 7, 2015
  • by franzip
  • Repository
  • 1 Watchers
  • 0 Stars
  • 117 Installations
  • PHP
  • 1 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 1 Versions
  • 3 % Grown

The README.md

Build Status Coverage Status, (*1)

SerpFetcher

Wrapper around SimpleHtmlDom to easily fetch data from Search Engine Result Pages with built-in caching support., (*2)

Install composer in your project:, (*3)

curl -s http://getcomposer.org/installer | php

Create a composer.json file in your project root:, (*4)

{
    "require": {
        "franzip/serp-fetcher": "0.2.*@dev"
    }
}

Install via composer, (*5)

php composer.phar install

Supported Search Engines

  • Google
  • Bing
  • Ask
  • Yahoo

Under no circumstances I shall be considered liable to any user for direct, indirect, incidental, consequential, special, or exemplary damages, arising from or relating to userʹs use or misuse of this software. Consult the following Terms of Service before using SerpFetcher:, (*6)

Description

You can create a SerpFetcher using both the provided Factory or importing the fetcher you need directly into your namespace., (*7)

All the various implementations share a common abstract ancestor class SerpFetcher, and therefore expose five main configurable attributes through setters:, (*8)

SerpFetcher($cacheDir = 'cache', $cacheTTL = 24, $caching = true,
            $cachingForever = false, $charset = 'UTF-8')
  1. $cacheDir
    • Path to the folder to use as temporary cache.
    • You can specify an absolute or relative path.
    • If it doesn't exist, the folder will be automatically created on instantiation.
  2. $cacheTTL
    • The expiration time of the cache, expressed in hours.
  3. $caching
    • Flag if the object should use caching.
  4. $cacheForever
    • Flag if the object should use permanent caching (cached pages will never expire).
  5. $charset
    • Charset to use.
    • Note: Only UTF-8 (used as default) has been tested so far.

The main method fetch() implemented for each class returns an associative array with urls, snippets and titles for a given SERP url. If the array with fetched results has less than 10 entries, padding will be added to sum up to 10., (*9)

Constructor (using Factory)

Supply the name of the search engine and you are ready to go. It is possible to pass an optional array with custom arguments., (*10)

use Franzip\SerpFetcher\SerpFetcherBuilder;

$googleFetcher = SerpFetcherBuilder::create('Google');
$askFetcher = SerpFetcherBuilder::create('Ask', array($cacheDir = 'foo/bar'));
$bingFetcher = SerpFetcherBuilder::create('Bing', array($cacheDir = 'baz',
                                                        $cacheTTL = 1));
...

Constructor (using Fetchers directly)

use Franzip\SerpFetcher\Fetchers\AskFetcher;
use Franzip\SerpFetcher\Fetchers\BingFetcher;
use Franzip\SerpFetcher\Fetchers\GoogleFetcher;

$googleFetcher = new GoogleFetcher();
$askFetcher = new AskFetcher('foo/bar');
$bingFetcher = new BingFetcher('baz', 1);
...

Basic Usage

use Franzip\SerpFetcher\SerpFetcherBuilder;

$googleFetcher = SerpFetcherBuilder::create('Google');
$urlToFetch = 'http://www.google.com/search?q=foo';
$fetchedResults = $googleFetcher->fetch($urlToFetch);
// doing your things with the results...

cacheHit()

Your code can handle cache hit and cache miss., (*11)

use Franzip\SerpFetcher\SerpFetcherBuilder;

$googleFetcher = SerpFetcherBuilder::create('Google');
$urlToFetch = 'http://www.google.com/search?q=foo';
var_dump($googleFetcher->cacheHit($urlToFetch));
// bool(false)
$fetchedResults = $googleFetcher->fetch('http://www.google.com/search?q=foo');
var_dump($googleFetcher->cacheHit($urlToFetch));
// bool(true)

if ($googleFetcher->cacheHit($urlToFetch)) {
    // handle cache hit
} else {
    // handle cache miss
}

flushCache() and removeCache()

Each fetched url get cached as a single file. You can remove all those files by calling flushCache(). removeCache() will also remove the folder used as cache., (*12)

use Franzip\SerpFetcher\SerpFetcherBuilder;

$googleFetcher = SerpFetcherBuilder::create('Google');
$urlToFetch = 'http://www.google.com/search?q=foo';
var_dump($googleFetcher->cacheHit($urlToFetch));
// bool(false)
$fetchedResults = $googleFetcher->fetch('http://www.google.com/search?q=foo');
var_dump($googleFetcher->cacheHit($urlToFetch));
// bool(true)
$googleFetcher->flushCache();
var_dump($googleFetcher->cacheHit($urlToFetch));
// bool(false)

Fine Tuning (Setters)

use Franzip\SerpFetcher\SerpFetcherBuilder;

$googleFetcher = SerpFetcherBuilder::create('Google');
// change cache folder to foo/
$googleFetcher->setCacheDir('foo');
// change cache expiration to 2 days
$googleFetcher->setCacheTTL(48);
// enable permanent caching
$googleFetcher->enableCachingForever();

Using multiple cache directories

Just switch between folders with the setCacheDir() method, (*13)

use Franzip\SerpFetcher\SerpFetcherBuilder;

$googleFetcher = SerpFetcherBuilder::create('Google',
                                            array('foo'));
// fetch some stuff... foo/ will be used as cache folder now
...
// fetched results will now be cached in foobar/
$googleFetcher->setCacheDir('foobar');
// switch back to the initial cache folder foo/
$googleFetcher->setCacheDir('foo');

TODOs

  • [x] A decent exceptions system.
  • [x] Support for HHVM.
  • [ ] Implement and test different charset support.
  • [x] Refactoring messy tests.

License

MIT Public License., (*14)

The Versions

07/10 2015

dev-master

9999999-dev http://github.com/franzip/serp-fetcher

Wrapper around SimpleHtmlDom to easily fetch data from Search Engine Result Pages with built-in caching support.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Francesco Pezzella

fetch cache data simplehtmldom search-engine serp