2017 © Pedro Peláez
 

library curl-robot

Parallel URL crawling extension to curl-easy.

image

stil/curl-robot

Parallel URL crawling extension to curl-easy.

  • Thursday, March 17, 2016
  • by stil
  • Repository
  • 7 Watchers
  • 27 Stars
  • 11,968 Installations
  • PHP
  • 2 Dependents
  • 0 Suggesters
  • 8 Forks
  • 0 Open issues
  • 4 Versions
  • 4 % Grown

The README.md

Installation

You can install this package with Composer. Run following command:, (*1)

composer require stil/curl-robot:dev-master

Example 1

<?php
require __DIR__.'/../vendor/autoload.php';

use cURL\Request;
use cURL\Robot\RobotSwarm;
use cURL\Robot\Robot;
use cURL\Robot\Event\RequestAttachingEvent;
use cURL\Robot\Event\RequestCompletedEvent;
use cURL\Robot\RequestProviderInterface;

class Crawler implements RequestProviderInterface
{
    protected $number = 0;

    /**
     * Method returning next request to execute
     */
    public function nextRequest()
    {
        return new Request("http://httpbin.org/delay/1?num=".($this->number++));
    }
}

$swarm = new RobotSwarm();
$swarm->setRequestProvider(new Crawler());
$swarm->getDefaultOptions()->set([
    CURLOPT_TIMEOUT        => 5,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER     => [
        'User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0',
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
    ]
]);

$robot1 = new Robot();
$robot1->setQueueSize(1);
$robot1->addRateLimit(new RateLimit(20, 60));
$robot1->addListener('request.attaching', function (RequestAttachingEvent $e) {
    echo "Attaching request from robot1\n";
});

$robot2 = new Robot();
$robot2->setQueueSize(3);
$robot1->addRateLimit(new RateLimit(120, 60));
$robot2->addListener('request.attaching', function (RequestAttachingEvent $e) {
    // Proxy requests
    $e->request->getOptions()->set(CURLOPT_PROXY, '10.0.0.1:8080');
    echo "Attaching request from robot2 (proxied)\n";
});

$swarm->addListener('request.completed', function (RequestCompletedEvent $e) {
    $httpCode = $e->response->getInfo(CURLINFO_HTTP_CODE);

    if ($httpCode == 200) {
        $json = $e->response->getContent();
        $data = json_decode($json, true);
        printf("Successful request #%d\n", $data['args']['num']);
    } else {
        printf("Wrong HTTP code %d\n", $httpCode);
        // Retry request until we exceeded allowed amount of attempts
        if ($e->handler->getAttempts() < 3) {
            printf("Retrying, attempt %d\n", $e->handler->getAttempts());
            $e->swarm->retry($e->handler);
        }
    }
});

$swarm->add($robot1);
$swarm->add($robot2);
$swarm->run();

The Versions

17/03 2016

dev-master

9999999-dev

Parallel URL crawling extension to curl-easy.

  Sources   Download

MIT

The Requires

 

03/05 2015

v0.2.0

0.2.0.0

Parallel URL crawling extension to curl-easy.

  Sources   Download

MIT

The Requires

 

16/02 2015

v0.1.1

0.1.1.0

Parallel URL crawling extension to curl-easy.

  Sources   Download

MIT

The Requires

 

24/08 2014

v0.1.0

0.1.0.0

  Sources   Download

The Requires