2017 © Pedro Peláez
 

library webcrawler

image

ronappleton/webcrawler

  • Sunday, July 15, 2018
  • by Ron Appleton
  • Repository
  • 0 Watchers
  • 0 Stars
  • 1 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 1 Versions
  • 0 % Grown

The README.md

Web Crawler

Simple web crawler for retrieving site links, (*1)

This web crawler package is a simple package, designed for taking websites and extracting the files it can find from the html that the site provides., (*2)

It is restricted to the source domain by default, can be altered using the restrict_domain option of the crawl method., (*3)

It was built for handling known self linking sites, although I will add controls to prevent external crawling when required., (*4)

It is simple to use, and solves some of the issues other people have had trying to build simple crawlers., (*5)

Supported

  • Scanning and retrieving web page.
  • Reading and pulling out all links in web page.
  • Deducing if link is to another directory or to a file.
  • Storing file and directory location (web location)
  • Handles relative and non relative urls
  • Times crawls
  • Provides minimal count statistic
  • Exports data collected as array
  • Exports data collected as Json

Warning

Use this at your own risk, please don't crawl sites of people that are not expecting it, the risk is all yours, (*6)

Simple Test Script

A simple script for testing is included., (*7)

The Versions

15/07 2018

dev-master

9999999-dev

  Sources   Download

The Requires

 

by Avatar Ron Appleton