Scout, PHP Scraper - Data your way
, (*1)
Scout is a easy-to-use and fast scraper that uses your knowledge of PHP to transform data the way you want without having to learn another transformation language such as XSLT., (*2)
This is currently in stable beta and I encourage submitting tickets for bug, feedback, and ideas., (*3)
Currently Supported
- Document types: HTML and XML
- Querying: XPath
- PHP 5.4+, including PHP 7!
Planned for the future
- Save to a JSON, CSV, and XML file
- Support for querying with CSS selectors
- Support for querying JSON
- Ability to persist information and track atomic changes
Possible Uses
- Track search rankings
- Spy competitors websites
- Scrape coupon websites
- Scrape websites for your own aggregation website
- Migrate data from large static websites to import into a CMS
- Get a list of jobs you're interested in from a wide range of job boards online
- Transform XML responses from your webservice into JSON
- Anything else that involves transforming XML/HTML to a data structure you want.
Consulting
For consulting, contact jorge@2upmedia.com, (*4)
Examples
<?php
$queryHandler = new Xpath(Html::parseDocument(file_get_contents('./tests/fixtures/header-and-table.html')));
$titlesAndPrices = (new DataPoint())->setQueryHandler($queryHandler);
$data = $titlesAndPrices
->setCollection('//table/tr')
->forKey('title')->set('./td[1]') // each tr is used as a context, so the key selectors should use "." to be relative to it
->forKey('price')->set('./td[2]')
->getData();
/*
array (
0 =>
array (
'title' => 'Title #1',
'price' => '$10.00',
),
1 =>
array (
'title' => 'Title #2',
'price' => '$23.20',
),
2 =>
array (
'title' => 'Title #3',
'price' => '$1.00',
),
3 =>
array (
'title' => 'Title #4',
'price' => '$5.00',
),
)
*/
For more information on how to use the API please have a look at the integration test., (*5)
XPath Primer
Currently XPath is used as the query language. XPath is simple to use after a little bit of practice., (*6)
The core of XPath is the "path". If you understand file paths and URLs, you understand half of XPath already., (*7)
Read up on the syntax: http://www.w3schools.com/xpath/xpath_syntax.asp. Then have a look at the XPath Primer example., (*8)