2017 © Pedro Peláez
 

library daric

image

lalbert/daric

  • Tuesday, July 31, 2018
  • by lalbert
  • Repository
  • 1 Watchers
  • 0 Stars
  • 15 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 2 Versions
  • 0 % Grown

The README.md

Daric

Daric is a Simple and configurable PHP web spider and web scraper written under the Goutte library., (*1)

Installation

The best way to install Daric it use composer, (*2)

composer require lalbert/daric

Usage

There are two components : Scraper and Spider., (*3)

Scraper

Scaper is used to extract, clean, and format web page data., (*4)

It uses extractors, cleaners and formatters to achieve its goals., (*5)

use Daric\Scraper;
use Daric\Extractor\CrawlerExtractorFactory;

$scraper = new Scrapper('http://website.tld');
$scraper->setExtractors([
  'meta_title' => CrawlerExtractorFactory::create('title@_text'), // get text node of <title></title>
  'meta_description' => CrawlerExtractorFactory::create('meta[name="description"]@content'), // get attribute "content" of <meta name="description" />
  'list' => CrawlerExtractorFactory::create('#content ul.list li@_text("array")') // get all text node of li item. Return an array
]);

$doc = $scraper->scrape(); // return Daric\Document

echo $doc->getData('meta_title');
print_r($doc['list']);

Spider

Spider is used to crawl a website to scrape some web page data., (*6)

use Daric\Spider;
use Daric\Scraper;
use Daric\Extractor\CrawlerExtractorFactory;

$spider = new Spider('http://website.tld');

$spider->setLinkExtractor(CrawlerExtractorFactory::create('#content article a.link@href("array")'));
$spider->setNextLinkExtractor(CrawlerExtractorFactory::create('#nav a.next@href'));

foreach ($spider as $pageUri) {
  $scraper = new Scraper($pageUri, $extractors, $cleaners, $formatters);
  $doc = $scraper->scrape();
  ...
}

Licence

Daric is licensed under the MIT License - see the LICENSE file for details, (*7)

The Versions

31/07 2018

dev-master

9999999-dev

  Sources   Download

22/08 2016

dev-develop

dev-develop

Simple and configurable PHP web spider and web scraper

  Sources   Download

MIT

The Requires

 

The Development Requires

spider scraper