Web Crawler

Simple web crawler for retrieving site links, _(*1)

This web crawler package is a simple package, designed for taking websites and extracting the files it can find from the html that the site provides., _(*2)

It is restricted to the source domain by default, can be altered using the restrict_domain option of the crawl method., _(*3)

It was built for handling known self linking sites, although I will add controls to prevent external crawling when required., _(*4)

It is simple to use, and solves some of the issues other people have had trying to build simple crawlers., _(*5)

Supported

Scanning and retrieving web page.
Reading and pulling out all links in web page.
Deducing if link is to another directory or to a file.
Storing file and directory location (web location)
Handles relative and non relative urls
Times crawls
Provides minimal count statistic
Exports data collected as array
Exports data collected as Json

Warning

Use this at your own risk, please don't crawl sites of people that are not expecting it, the risk is all yours, _(*6)

Simple Test Script

A simple script for testing is included., _(*7)

15/07 2018

dev-master

9999999-dev

Sources Download

library webcrawler

ronappleton/webcrawler

The README.md

Web Crawler

Supported

Warning

Simple Test Script

The Versions

dev-master

The Requires

by Ron Appleton