2017 © Pedro Peláez
 

library schrapert

image

maikelvanmaurik/schrapert

  • Tuesday, December 20, 2016
  • by maikelvanmaurik
  • Repository
  • 1 Watchers
  • 0 Stars
  • 2 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 3 Versions
  • 0 % Grown

The README.md

Schrapert

Schrapert is a scraping/crawler library which is inspired by scrapy. It makes use of React for various operations such as downloading requests and writing files., (*1)

Example of a simple spider:, (*2)

namespace Crawl;
use Schrapert\Spider;
use Schrapert\Crawl\ResponseInterface;
use Schrapert\Http\ResponseInterface as HttpResponse;
use Schrapert\Http\Request as HttpRequest;
use DOMDocument;
use DOMXPath;
use DOMElement;
class BlogSpider extends Spider
{    
    public function parse(ResponseInterface $response)
    {
        if(!$response instanceof HttpResponse) {
            return;
        }
        $doc = new DOMDocument('1.0');
        $doc->loadHTML((string)$response->getBody());
        $xpath = new DOMXPath($doc);
        $nodes = $xpath->query('//a');
        foreach($nodes as $node) {
            /* @var $node DOMElement */
            $uri = $this->uri->join($node->getAttribute('href'), $response->getUri());
            yield new HttpRequest($uri);
        }
    }
}    

The Versions