2017 © Pedro Peláez
 

library multigrabber

Download and parse HTML with multiple parallel requests and PicoFeed grabber

image

rakshazi/multigrabber

Download and parse HTML with multiple parallel requests and PicoFeed grabber

  • Sunday, September 11, 2016
  • by rakshazi
  • Repository
  • 1 Watchers
  • 0 Stars
  • 7 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 2 Versions
  • 0 % Grown

The README.md

Multigrabber

Special combination of PicoFeed parser and MCurl These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ver)., (*1)

Test results (100 urls, multiple sites): 64 sec and 0.36MB RAM for download and parse all content., (*2)

Installation

composer require rakshazi/multigrabber

Usage

<?php
require dirname(__DIR__) . '/vendor/autoload.php';
$config = new \PicoFeed\Config\Config;
$config->setGrabberRulesFolder(__DIR__ . '/rules'); //PicoFeed grabber rules, @link https://github.com/fguillot/picoFeed/blob/master/docs/feed-parsing.markdown#custom-regex-filters
$config->setClientUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36');
$grabber = new \Rakshazi\Multigrabber($config);
$urls = ['http://example.site/1', 'https://example.site/post2', '...'];
$data = $grabber->run($urls);

var_dump($data);

Output:, (*3)

array(2) {
  ["http://example.site/1"]=>
  string(978) "</p>Parsed content from nat-geo.ru (text was removed in this example) <a href="http://www.nat-geo.ru/go.php?url=http%3A%2F%2Fvk.com%2Fstudio_vd" rel="noreferrer" target="_blank">Vert Dider</a>.</p>


, (*4)

" ["https://example.site/post2"]=> string(3675) "Parsed <strong>html</strong>" }

The Versions

11/09 2016

dev-master

9999999-dev https://github.com/rakshazi/multigrabber

Download and parse HTML with multiple parallel requests and PicoFeed grabber

  Sources   Download

MIT

The Requires

 

curl parser php http client dom async requests xpath parallel spider multi grabber

11/09 2016

1.0

1.0.0.0 https://github.com/rakshazi/multigrabber

Download and parse HTML with multiple parallel requests and PicoFeed grabber

  Sources   Download

MIT

The Requires

 

curl parser php http client dom async requests xpath parallel spider multi grabber