2017 © Pedro Peláez
 

library crawler

HTTP crawler on non-blocking sockets

image

pmaxs/crawler

HTTP crawler on non-blocking sockets

  • Thursday, June 16, 2016
  • by pmaxs
  • Repository
  • 1 Watchers
  • 0 Stars
  • 14 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 2 Versions
  • 0 % Grown

The README.md

crawler

HTTP crawler on non-blocking sockets., (*1)

Installation

composer require pmaxs/crawler

Usage

<?php
require '../vendor/autoload.php';

use Pmaxs\Crawler\Request;
use Pmaxs\Crawler\Response;
use Pmaxs\Crawler\Crawler;

function process(Request $request, Response $response)
{
    echo $response->url." (".$response->remoteIp.":".$response->remotePort.")\n";

    echo "code: ".$response->code."; "
        ."start: ".$response->timeStart."; "
        ."finish: ".$response->timeFinish."; "
        ."time: ".($response->timeFinish - $response->timeStart)
        ."\n";

    echo $response->body."\n";
}

$Crawler = new Crawler(array(
    'time_limit'=>30,
    'rps'=>4,
));

for ($i=1; $i<=20; $i++)
    $Crawler->request(
        new Request('http://example.com/p/'.$i),
        '\process'
    );

$Crawler->process();

Output:, (*2)

http://example.com/p/1 (xxx.xxx.xxx.xxx:xx)
code: 200; start: 1453652698.8725; finish: 1453652699.8008; time: 0.92827606201172
p: 1

http://example.com/p/2 (xxx.xxx.xxx.xxx:xx)
code: 200; start: 1453652698.8725; finish: 1453652699.8008; time: 0.92827606201172
p: 2

http://example.com/p/3 (xxx.xxx.xxx.xxx:xx)
code: 200; start: 1453652698.8725; finish: 1453652699.8008; time: 0.92827606201172
p: 3

...

The Versions

16/06 2016

dev-master

9999999-dev https://github.com/pmaxs/crawler

HTTP crawler on non-blocking sockets

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

crawler

16/06 2016

v1.0.0

1.0.0.0 https://github.com/pmaxs/crawler

HTTP crawler on non-blocking sockets

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

crawler