travy/web-crawler

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

Wednesday, September 27, 2017
by travy
Repository
1 Watchers
0 Stars
0 Installations

PHP
0 Dependents
0 Suggesters
0 Forks
0 Open issues
2 Versions
0 % Grown

Web Crawler

Description

This is a work in progress and is not yet complete, _(*1)

Web Crawler is an open source technology which will enable users to crawl through the a collection of webpages and executing customized analyzers on each page., _(*2)

Installation

Add the library to your PHP project using composer., _(*3)

composer require travy/web-crawler

Use Case

The Crawler will automatically pull all URL addresses listed under an HTML anchor tag on the root URL. Each page that is visited will be run through a collection of Analyzers. These Analyzers can perform various tasks needed for the use of the application such as pruning the markup in order to build a search engine, or almost anything else that can be analyzed., _(*4)

Custom Analyzer

Analyzers can be created by extending the AbstractAnalyzer class, _(*5)

class MyAnalyzer extends AbstractAnalyzer
{
    public function analyze($url, $html, Dom $parser)
    {
        //  perform tasks
    }
}

Analyzer Registry

The AnalyzerRegistry will contain a list of all Analyzers that should be used while crawling the web. Each analyzer will be assigned a unique key so that fields can be manipulated if needed., _(*6)

$analyzer = new MyAnalyzer();

$analyzerRegistry = new AnalyzerRegistry();
$analyzerRegistry->registrer($analyzer, 'add-to-database');

$crawler = new Crawler('https://google.com', $analyzerRegsitry);
$crawler->crawl();

27/09 2017

dev-develop

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

Sources Download

MIT

The Requires

The Development Requires

phpunit/phpunit 5.7

by Travis Anthony Torres

26/09 2017

dev-master

9999999-dev

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

Sources Download

MIT

The Requires

The Development Requires

phpunit/phpunit 5.7

library web-crawler

A web crawler which will traverse links found in all parent and child web pages running and performs tasks on each page.

travy/web-crawler

The README.md

Web Crawler

Description

Installation

Use Case

Custom Analyzer

Analyzer Registry

The Versions

dev-develop

The Requires

The Development Requires

by Travis Anthony Torres

dev-master

The Requires

The Development Requires

by Travis Anthony Torres