2017 © Pedro Peláez
 

library rolling-curl-mini

image

hindmost/rolling-curl-mini

  • Sunday, February 21, 2016
  • by bedletskyi
  • Repository
  • 6 Watchers
  • 15 Stars
  • 1 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 10 Forks
  • 0 Open issues
  • 3 Versions
  • 0 % Grown

The README.md

Rolling Curl Mini

Rolling Curl Mini is a fork of Rolling Curl. It allows to process multiple HTTP requests in parallel using cURL PHP library., (*1)

For more information read this article (in russian)., (*2)

Basic Usage Sample

``` php ... require "RollingCurlMini.php"; ... $o_mc = new RollingCurlMini(10); ... $o_mc->add($url, $postdata, $callback, $userdata, $options, $headers); ... $o_mc->execute(); ..., (*3)


Callbacks ------------- Any request may have an individual callback - function/method to be called as this request is completed. Callback accepts 4 parameters and may look like the following one: ``` php /** * @param string $content - content of request response * @param string $url - URL of requested resource * @param array $info - cURL handle info * @param mixed $userdata - user-defined data passed with add() method */ function request_callback($content, $url, $info, $userdata) { }

License

Rolling Scraper Abstract

Rolling Scraper Abstract is a multipurpose scraping (crawling) framework which uses facilities of multi-curl and RollingCurlMini class. It is a base PHP class which implement common functionality of a multi-curl scraper. Particular functionality should be implemented in derived classes. Particular scraper class should extend RollingScraperAbstract class and implement (override) two mandatory methods: _initPages and _handlePage., (*4)

For more information read this article (in russian)., (*5)

Example, (*6)

Basic Usage Sample

``` php class MyScraper extends RollingScraperAbstract { ... public function __construct() { ... $this->modConfig(array( 'state_time_storage' => '...', // temporal section of state storage (file path) 'state_data_storage' => '...', // data section of state storage (file path) 'scrape_life' => 0, // expiration time (secs) of scraped data 'run_timeout' => 30, // max. time (secs) to execute scraper script 'run_pages_loops' => 20, // max. number of loops through pages 'run_pages_buffer' => 500, // page requests buffer size 'curl_threads' => 10, // number of multi-curl threads 'curl_options' => array(...), // CURL options used in multi-curl requests )); parent::__construct(); }, (*7)

/**
 * Initialize the starting list of page requests
 */
protected function _initPages() {
    ...
    // add page request. $url - page URL
    $this->addPage($url);
    ...
}

/**
 * Process response of a page request
 * @param string $cont - page content
 * @param string $url - url of request
 * @param array $aInfo - CURL info data
 * @param int $index - # of page request
 * @param array $aData - custom request data (part of request data)
 * @return bool
 */
protected function _handlePage($cont, $url, $aInfo, $index, $aData) {
    ...
}
...

}, (*8)

$scraper = new MyScraper(); $bool = $scraper->run(); list($time_start, $time_end, , $time_run_start, , $n_pages_total, $n_pages_passed) = $scraper->getStateProgress(); if ($time_end) { echo sprintf('Completed at %s', date('Y.m.d, H:i:s', $time_end)); } else { if ($bool) echo sprintf('In progress: %d/%d pages', $n_pages_passed, $n_pages_total); else echo 'Cancelled since another script instance is still running'; } ```, (*9)

The Versions

21/02 2016

dev-master

9999999-dev

  Sources   Download

The Requires

  • php >=5.0.0

 

21/01 2015

1.0.6

1.0.6.0

  Sources   Download

The Requires

  • php >=5.0.0

 

11/12 2014

1.0.0

1.0.0.0

  Sources   Download

The Requires

  • php >=5.0.0