2017 © Pedro Peláez
 

library botium

A light web crawl written in PHP

image

deloz/botium

A light web crawl written in PHP

  • Thursday, July 9, 2015
  • by deloz
  • Repository
  • 2 Watchers
  • 1 Stars
  • 9 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 1 Forks
  • 0 Open issues
  • 2 Versions
  • 0 % Grown

The README.md

Botium

A light web crawl written in PHP., (*1)

Installation

  1. Install composer:, (*2)

    curl -sS https://getcomposer.org/installer | php
    

    You can add Botium as a dependency using the composer.phar CLI:, (*3)

    php composer.phar require deloz/botium:~0.1
    
  2. Alternatively, you can specify Botium as dependency in your project's existing composer.json file:, (*4)

    {
        "require": {
          "deloz/botium": "~0.1"
        }
    }
    
  3. After installing, you need to require Composer's autoloader:, (*5)

        require 'vendor/autoload.php';
    

Running the tests

cd tests
php runtest.php

Usage

$settings must contain baseUrl, eg:, (*6)

$settings = [
    'baseUrl' => 'www.douban.com',
    'debug' => true,
    'interval' => 10,
    'every' => 5,
];

every site is a Class which inherit from Deloz\Botium\Botium with overriding the methods as blow:, (*7)

namespace Tests;

use Symfony\Component\DomCrawler\Crawler;
use Deloz\Botium\Response;
use Deloz\Botium\Botium;

class Haixiu extends Botium
{
    public function start()
    {
        $res = $this->crawl('http://www.douban.com/group/haixiuzu/discussion');
        $res and $this->index($res);
    }

    public function index(Response $res)
    {
        $res->doc('td.title > a')->each(function (Crawler $node, $i) {
            $link = $node->attr('href');
            if ($link) {
                $res = $this->crawl($link);
                $res and $this->detail($res);
            }
        });
    }

    public function detail(Response $res)
    {
        $title = $res->doc('#content > h1')->text();
        $author = $res->doc('#content > div > div.article > div.topic-content.clearfix > div.topic-doc > h3 > span.from > a')->text();
        $images = [];
        $res->doc('div.topic-content > div.topic-figure.cc img')->each(function (Crawler $node, $i) use (&$images, $res) {
            $img = $node->attr('src');
            if ($img) {
                $images[] = $img;
            }
        });

        $this->result([
            'title' => $title,
            'author' => $author,
            'images' => $images,
        ]);
    }

    public function result(array $item = [])
    {
        var_dump($item);
    }
}

more examples, see directory tests, (*8)

License

licensed using the MIT license, (*9)

The Versions

09/07 2015

dev-master

9999999-dev

A light web crawl written in PHP

  Sources   Download

MIT

The Requires

 

The Development Requires

by Avatar deloz

search http uri spider crawl

09/07 2015

0.1.0

0.1.0.0

A light web crawl written in PHP

  Sources   Download

MIT

The Requires

 

The Development Requires

by Avatar deloz

search http uri spider crawl