ScraperMoreFaster

ScraperMoreFaster is a PHP class built to scrape the content of a webpage faster than SimpleHTMLDOM (by SourceForge). It came about when I needed a faster scraper solution for a web crawler. SimpleHTMLDOM is a wonderful parser, and very robust in it's feature set. But unfortunately too slow for crawler purposes, where every millisecond counts., _(*1)

Setup

The ScraperMoreFaster.php file found in the lib folder is the only file needed to use this class., _(*2)

Usage

Defining the HTML to be Parsed

$scraper_more_faster = new ScraperMoreFaster;

To define the HTML file from a URL:, _(*3)

$scraper_more_faster->file_get_html($url);

This will define the HTML by using the file_get_contents php command to pull in the HTML from the given URL., _(*4)

To define the HTML file from a string:, _(*5)

$scraper_more_faster->str_get_html($html_str);

This will define the HTML simply from the string passed in the $html_str var., _(*6)

Scrape PlainText from page

My biggest purpose for creating this class was for the PlainText functionality. In speed tests, I found the plaintext functionality to be dozens of times faster than SimpleHTMLDOM's plaintext functionality. And in all comparison tests, the plaintext from each tool was 99% - 100% similar., _(*7)

To run this command (after defining the HTML as desribed above):, _(*8)

$scraper_more_faster->plaintext();

Examples

See smf_tester.php for example usage., _(*9)

16/12 2013

dev-master

9999999-dev https://github.com/josephbergevin/scraper-more-faster

A PHP Web Scraper - Designed for Speed

Sources Download

The Requires

php >=5.3.0

by Joe Bergevin

php scraping website scraper

library scraper-more-faster