2017 © Pedro Peláez
 

library scraper-more-faster

A PHP Web Scraper - Designed for Speed

image

josephbergevin/scraper-more-faster

A PHP Web Scraper - Designed for Speed

  • Monday, December 16, 2013
  • by josephbergevin
  • Repository
  • 1 Watchers
  • 9 Stars
  • 371 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 1 Forks
  • 2 Open issues
  • 1 Versions
  • 1 % Grown

The README.md

ScraperMoreFaster

ScraperMoreFaster is a PHP class built to scrape the content of a webpage faster than SimpleHTMLDOM (by SourceForge). It came about when I needed a faster scraper solution for a web crawler. SimpleHTMLDOM is a wonderful parser, and very robust in it's feature set. But unfortunately too slow for crawler purposes, where every millisecond counts., (*1)

Setup

The ScraperMoreFaster.php file found in the lib folder is the only file needed to use this class., (*2)

Usage

Defining the HTML to be Parsed

$scraper_more_faster = new ScraperMoreFaster;

To define the HTML file from a URL:, (*3)

$scraper_more_faster->file_get_html($url);

This will define the HTML by using the file_get_contents php command to pull in the HTML from the given URL., (*4)

To define the HTML file from a string:, (*5)

$scraper_more_faster->str_get_html($html_str);

This will define the HTML simply from the string passed in the $html_str var., (*6)

Scrape PlainText from page

My biggest purpose for creating this class was for the PlainText functionality. In speed tests, I found the plaintext functionality to be dozens of times faster than SimpleHTMLDOM's plaintext functionality. And in all comparison tests, the plaintext from each tool was 99% - 100% similar., (*7)

To run this command (after defining the HTML as desribed above):, (*8)

$scraper_more_faster->plaintext();

Examples

See smf_tester.php for example usage., (*9)

The Versions

16/12 2013

dev-master

9999999-dev https://github.com/josephbergevin/scraper-more-faster

A PHP Web Scraper - Designed for Speed

  Sources   Download

The Requires

  • php >=5.3.0

 

by Joe Bergevin

php scraping website scraper