2017 © Pedro Peláez
 

library web-crawler

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

image

travy/web-crawler

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

  • Wednesday, September 27, 2017
  • by travy
  • Repository
  • 1 Watchers
  • 0 Stars
  • 0 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 2 Versions
  • 0 % Grown

The README.md

Web Crawler

Description

This is a work in progress and is not yet complete, (*1)

Web Crawler is an open source technology which will enable users to crawl through the a collection of webpages and executing customized analyzers on each page., (*2)

Installation

Add the library to your PHP project using composer., (*3)

composer require travy/web-crawler

Use Case

The Crawler will automatically pull all URL addresses listed under an HTML anchor tag on the root URL. Each page that is visited will be run through a collection of Analyzers. These Analyzers can perform various tasks needed for the use of the application such as pruning the markup in order to build a search engine, or almost anything else that can be analyzed., (*4)

Custom Analyzer

Analyzers can be created by extending the AbstractAnalyzer class, (*5)

class MyAnalyzer extends AbstractAnalyzer
{
    public function analyze($url, $html, Dom $parser)
    {
        //  perform tasks
    }
}

Analyzer Registry

The AnalyzerRegistry will contain a list of all Analyzers that should be used while crawling the web. Each analyzer will be assigned a unique key so that fields can be manipulated if needed., (*6)

$analyzer = new MyAnalyzer();

$analyzerRegistry = new AnalyzerRegistry();
$analyzerRegistry->registrer($analyzer, 'add-to-database');

$crawler = new Crawler('https://google.com', $analyzerRegsitry);
$crawler->crawl();

The Versions

27/09 2017

dev-develop

dev-develop

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Travis Anthony Torres

26/09 2017

dev-master

9999999-dev

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Travis Anthony Torres