dev-master
9999999-dev https://github.com/josephbergevin/scraper-more-fasterA PHP Web Scraper - Designed for Speed
The Requires
- php >=5.3.0
by Joe Bergevin
php scraping website scraper
Wallogit.com
2017 © Pedro Peláez
A PHP Web Scraper - Designed for Speed
ScraperMoreFaster is a PHP class built to scrape the content of a webpage faster than SimpleHTMLDOM (by SourceForge). It came about when I needed a faster scraper solution for a web crawler. SimpleHTMLDOM is a wonderful parser, and very robust in it's feature set. But unfortunately too slow for crawler purposes, where every millisecond counts., (*1)
The ScraperMoreFaster.php file found in the lib folder is the only file needed to use this class., (*2)
$scraper_more_faster = new ScraperMoreFaster;
To define the HTML file from a URL:, (*3)
$scraper_more_faster->file_get_html($url);
This will define the HTML by using the file_get_contents php command to pull in the HTML from the given URL., (*4)
To define the HTML file from a string:, (*5)
$scraper_more_faster->str_get_html($html_str);
This will define the HTML simply from the string passed in the $html_str var., (*6)
My biggest purpose for creating this class was for the PlainText functionality. In speed tests, I found the plaintext functionality to be dozens of times faster than SimpleHTMLDOM's plaintext functionality. And in all comparison tests, the plaintext from each tool was 99% - 100% similar., (*7)
To run this command (after defining the HTML as desribed above):, (*8)
$scraper_more_faster->plaintext();
See smf_tester.php for example usage., (*9)
A PHP Web Scraper - Designed for Speed
php scraping website scraper