2017 © Pedro Peláez
 

library sitemap-common

Sitemap generator and crawler library

image

dmoraschi/sitemap-common

Sitemap generator and crawler library

  • Monday, August 22, 2016
  • by DXI-8x8
  • Repository
  • 1 Watchers
  • 0 Stars
  • 57 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 4 Versions
  • 2 % Grown

The README.md

A PHP sitemap generator and crawler

#

Build Status Scrutinizer Quality Score, (*1)

This package provides all of the components to crawl a website and build and write sitemaps file., (*2)

Example of console application using the library: dmoraschi/sitemap-app, (*3)

Installation

Run the following command and provide the latest stable version (e.g v1.0.0):, (*4)

composer require dmoraschi/sitemap-common

or add the following to your composer.json file :, (*5)

"dmoraschi/sitemap-common": "1.0.*"
``````

`SiteMapGenerator`
-----
**Basic usage**

``` php
$generator = new SiteMapGenerator(
    new FileWriter($outputFileName),
    new XmlTemplate()
);

Add a URL: ``` php $generator->addUrl($url, $frequency, $priority);, (*6)


Add a single `SiteMapUrl` object or array: ``` php $siteMapUrl = new SiteMapUrl( new Url($url), $frequency, $priority ); $generator->addSiteMapUrl($siteMapUrl); $generator->addSiteMapUrls([ $siteMapUrl, $siteMapUrl2 ]);

Set the URLs of the sitemap via SiteMapUrlCollection: ``` php $siteMapUrl = new SiteMapUrl( new Url($url), $frequency, $priority );, (*7)

$collection = new SiteMapUrlCollection([ $siteMapUrl, $siteMapUrl2 ]);, (*8)

$generator->setCollection($collection);, (*9)


Generate the sitemap: ``` php $generator->execute();

Crawler

Basic usage, (*10)

``` php $crawler = new Crawler( new Url($baseUrl), new RegexBasedLinkParser(), new HttpClient() );, (*11)


You can tell the `Crawler` **not to visit** certain url's by adding policies. Below the default policies provided by the library: ```php $crawler->setPolicies([ 'host' => new SameHostPolicy($baseUrl), 'url' => new UniqueUrlPolicy(), 'ext' => new ValidExtensionPolicy(), ]); // or $crawler->setPolicy('host', new SameHostPolicy($baseUrl));

SameHostPolicy, UniqueUrlPolicy, ValidExtensionPolicy are provided with the library, you can define your own policies by implementing the interface Policy., (*12)

Calling the function crawl the object will start from the base url in the contructor and crawl all the web pages with the specified depth passed as a argument. The function will return with the array of all unique visited Url's:, (*13)

$urls = $crawler->crawl($deep);

You can also instruct the Crawler to collect custom data while visiting the web pages by adding Collector's to the main object:, (*14)

$crawler->setCollectors([
    'images' => new ImageCollector()
]);
// or
$crawler->setCollector('images', new ImageCollector());

And then retrive the collected data:, (*15)

$crawler->crawl($deep);

$imageCollector = $crawler->getCollector('images');
$data = $imageCollector->getCollectedData();

ImageCollector is provided by the library, you can define your own collector by implementing the interface Collector., (*16)

The Versions

22/08 2016

dev-master

9999999-dev

Sitemap generator and crawler library

  Sources   Download

MIT

The Requires

 

The Development Requires

by Daniele Moraschi

link crawler sitemap website crawl

22/08 2016

v1.1.0

1.1.0.0

Sitemap generator and crawler library

  Sources   Download

MIT

The Requires

 

The Development Requires

by Daniele Moraschi

link crawler sitemap website crawl

21/08 2016

dev-develop

dev-develop

Sitemap generator and crawler library

  Sources   Download

MIT

The Requires

 

The Development Requires

by Daniele Moraschi

link crawler sitemap website crawl

20/08 2016

v1.0.0

1.0.0.0

Sitemap generator and crawler library

  Sources   Download

MIT

The Requires

 

The Development Requires

by Daniele Moraschi

link crawler sitemap website crawl