2017 © Pedro Peláez
 

library scout

Flexible, structured scraping

image

2upmedia/scout

Flexible, structured scraping

  • Tuesday, July 7, 2015
  • by 2upmedia
  • Repository
  • 3 Watchers
  • 4 Stars
  • 19 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 5 Versions
  • 0 % Grown

The README.md

Scout, PHP Scraper - Data your way

Build Status Scrutinizer Quality Score Code Coverage Latest Stable Version Dependency Status, (*1)

Scout is a easy-to-use and fast scraper that uses your knowledge of PHP to transform data the way you want without having to learn another transformation language such as XSLT., (*2)

This is currently in stable beta and I encourage submitting tickets for bug, feedback, and ideas., (*3)

Currently Supported

  • Document types: HTML and XML
  • Querying: XPath
  • PHP 5.4+, including PHP 7!

Planned for the future

  • Save to a JSON, CSV, and XML file
  • Support for querying with CSS selectors
  • Support for querying JSON
  • Ability to persist information and track atomic changes

Possible Uses

  • Track search rankings
  • Spy competitors websites
  • Scrape coupon websites
  • Scrape websites for your own aggregation website
  • Migrate data from large static websites to import into a CMS
  • Get a list of jobs you're interested in from a wide range of job boards online
  • Transform XML responses from your webservice into JSON
  • Anything else that involves transforming XML/HTML to a data structure you want.

Consulting

For consulting, contact jorge@2upmedia.com, (*4)

Examples

<?php

$queryHandler = new Xpath(Html::parseDocument(file_get_contents('./tests/fixtures/header-and-table.html')));

$titlesAndPrices = (new DataPoint())->setQueryHandler($queryHandler);

$data = $titlesAndPrices
    ->setCollection('//table/tr')
    ->forKey('title')->set('./td[1]') // each tr is used as a context, so the key selectors should use "." to be relative to it
    ->forKey('price')->set('./td[2]')
    ->getData();
/*
    array (
      0 => 
      array (
        'title' => 'Title #1',
        'price' => '$10.00',
      ),
      1 => 
      array (
        'title' => 'Title #2',
        'price' => '$23.20',
      ),
      2 => 
      array (
        'title' => 'Title #3',
        'price' => '$1.00',
      ),
      3 => 
      array (
        'title' => 'Title #4',
        'price' => '$5.00',
      ),
    )
*/

For more information on how to use the API please have a look at the integration test., (*5)

XPath Primer

Currently XPath is used as the query language. XPath is simple to use after a little bit of practice., (*6)

The core of XPath is the "path". If you understand file paths and URLs, you understand half of XPath already., (*7)

Read up on the syntax: http://www.w3schools.com/xpath/xpath_syntax.asp. Then have a look at the XPath Primer example., (*8)

The Versions

07/07 2015

dev-master

9999999-dev

Flexible, structured scraping

  Sources   Download

BSD-2-Clause

The Requires

  • ext-xml *
  • php >=5.4

 

The Development Requires

07/07 2015

0.2.1

0.2.1.0

Flexible, structured scraping

  Sources   Download

BSD-2-Clause

The Requires

  • ext-xml *
  • php >=5.4

 

The Development Requires

07/07 2015

dev-develop

dev-develop

Flexible, structured scraping

  Sources   Download

BSD-2-Clause

The Requires

  • ext-xml *
  • php >=5.4

 

The Development Requires

07/07 2015

0.2

0.2.0.0

Flexible, structured scraping

  Sources   Download

The Requires

  • ext-xml *
  • php >=5.4

 

The Development Requires

06/07 2015

0.1

0.1.0.0

Flexible, structured scraping

  Sources   Download

The Requires

  • ext-xml *
  • php >=5.4

 

The Development Requires