2017 © Pedro Peláez
 

library diggin-scraper

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

image

diggin/diggin-scraper

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

  • PHP
  • 1 Dependents
  • 0 Suggesters
  • 2 Forks
  • 0 Open issues
  • 6 Versions
  • 7 % Grown

The README.md

Diggin_Scraper

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP's multidimensional array., (*1)

CHANGELOG for 0.9.0

change behavior when extracting target is not found https://github.com/diggin/Diggin_Scraper/issues/1, (*2)

if you want throw exception for v0.8 compatibility, please use throwTargetExceptionsOn method., (*3)

$scraper->throwTargetExceptionsOn(true);

Feature

  • into multidimensional array
  • Handle CSS Selector or XPath expression
  • Automatically convert to UTF-8
    • based on Diggin_Http_Charset
  • Beautify ugly HTML into XHTML automatically
    • based on Diggin_Scraper_Adapter_Htmlscraping & tidy
  • convert relative path into absolute URL automatically ("a href" & "img src")
  • Enable change Strategy (xpath or regex) & Enable change pretreat converting HTML

Requirements

  • PHP 5.3.3 or over
  • Zend Framework 2
  • Diggin components
    • Diggin_Http_Charset
    • Diggin_Scraper_Adapter_Htmlscraping

The Versions

24/12 2015

dev-master

9999999-dev

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

  Sources   Download

BSD-3-Clause

The Requires

 

The Development Requires

24/12 2015

v0.9.2

0.9.2.0

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

  Sources   Download

BSD-3-Clause

The Requires

 

The Development Requires

10/04 2014

v0.9.1

0.9.1.0

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

  Sources   Download

BSD-3-Clause

The Requires

 

The Development Requires

19/10 2013

v0.9.0

0.9.0.0

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

  Sources   Download

BSD-3-Clause

The Requires

 

The Development Requires

19/10 2013

v0.8.1

0.8.1.0

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

  Sources   Download

BSD-3-Clause

The Requires

 

The Development Requires

03/10 2012

v0.8.0

0.8.0.0

web-sraping component, inspired by Perl’s Web::Scraper. It provides a DSL-ish interface for traversing HTML documents and returning a neatly arranged PHP ‘s multidimensional array

  Sources   Download

BSD-3-Clause

The Requires

 

The Development Requires