Multigrabber
Special combination of PicoFeed parser and MCurl
These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ver)., (*1)
Test results (100 urls, multiple sites): 64 sec and 0.36MB RAM for download and parse all content., (*2)
Installation
composer require rakshazi/multigrabber
Usage
<?php
require dirname(__DIR__) . '/vendor/autoload.php';
$config = new \PicoFeed\Config\Config;
$config->setGrabberRulesFolder(__DIR__ . '/rules'); //PicoFeed grabber rules, @link https://github.com/fguillot/picoFeed/blob/master/docs/feed-parsing.markdown#custom-regex-filters
$config->setClientUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36');
$grabber = new \Rakshazi\Multigrabber($config);
$urls = ['http://example.site/1', 'https://example.site/post2', '...'];
$data = $grabber->run($urls);
var_dump($data);
Output:, (*3)
array(2) {
["http://example.site/1"]=>
string(978) "</p>Parsed content from nat-geo.ru (text was removed in this example) <a href="http://www.nat-geo.ru/go.php?url=http%3A%2F%2Fvk.com%2Fstudio_vd" rel="noreferrer" target="_blank">Vert Dider</a>.</p>
, (*4)
"
["https://example.site/post2"]=>
string(3675) "Parsed <strong>html</strong>"
}