2017 © Pedro Peláez
 

library web-harvester

Laravel HTTP Client with Javascript capabilites

image

malahierba-lab/web-harvester

Laravel HTTP Client with Javascript capabilites

  • Saturday, August 27, 2016
  • by malahierba
  • Repository
  • 3 Watchers
  • 5 Stars
  • 117 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 2 Open issues
  • 24 Versions
  • 8 % Grown

The README.md

Laravel Web Harvester

A tool for get information from external websites. Powered by PhantomJS and malahierba.cl dev team, (*1)

Installation

Add in your composer.json:, (*2)

{
    "require": {
        "malahierba-lab/web-harvester": "1.*"
    }
}

Then you need run the composer update command., (*3)

After install you must configure Service Provider. Simply add the service provider in the config/app.php providers section:, (*4)

Malahierba\WebHarvester\WebHarvesterServiceProvider::class

Now you need publish the config file. Simply execute php artisan vendor:publish, (*5)

Configuration

Laravel Web Harvester run using PhantomJS headless Webkit browser. This tool is included as binary, so before you can use this package you need to specify your OS. This can be done in config file config\webharvester.php., (*6)

You need set option environment with once of the options supported:, (*7)

  • linux-i686-32
  • linux-i686-64
  • macosx
  • windows

example: 'environment' => 'macosx', (*8)

Use

Important: For documentation purposes, in the examples below, always we assume than you import the library into your namespace using use Malahierba\WebHarvester;, (*9)

Get WebPage Components

$url = 'http://someurl';
$webharvester = new WebHarvester;

//Check if we can process the URL and Load it
if ($webharvester->load($url)) {

    //Page Title
    $title                   = $webharvester->getTitle();

    //Page Description
    $description             = $webharvester->getDescription();

    //Get Status Code (If the url redirect to another webpage, then return the status code for the final webpage)
    $status_code             = $webharvester->getStatusCode();

    //Page Featured Image as URL
    $featured_image_url      = $webharvester->getFeaturedImage();

    //Page Featured Image as Base64
    $featured_image_base_64  = $webharvester->getFeaturedImage('base64');

    //Page real URL (if the $url redirect to another, return the final)
    $real_url                = $webharvester->getRealURL();

    //Site Name
    $sitename                = $webharvester->getSiteName();
}

Get expected behavior of the Robot (based on meta name="robots")

$url = 'http://someurl';
$webharvester = new WebHarvester;

//Check if we can process the URL and Load it
if ($webharvester->load($url)) {

    //check for index
    if ($webharvester->isIndexable()) {

        //...some code

    }

    //check for follow
    if ($webharvester->isFollowable()) {

        //...some code

    }

}
$url = 'http://someurl';
$webharvester = new WebHarvester;

//Check if we can process the URL and Load it
if ($webharvester->load($url)) {

    //all full links as array

    $links = $webharvester->getLinks();  //retrieve an array with found links

    //all links as array, but query component removed (from the character "?" onwards)

    $links = $webharvester->getLinks([
        'remove' => ['query']
    ]);

    //retrieve links as array of objects (properties: url, follow)
    //if follow is false indicate than that links is marked to no follow (rel='nofollow') by the source website

    $links = $webharvester->getLinks(['only_urls' => false]); //default true

}

Important: For security reasons all links with embeded javascript are not included in output array, (*10)

Get WebPage Raw Content

$url = 'http://someurl';
$webharvester = new WebHarvester;

//Check if we can process the URL and Load it
if ($webharvester->load($url)) {
    $raw = $webharvester->content();
}

Take ScreenShoot of a WebPage

$url = 'http://someurl';
$webharvester = new WebHarvester;

//Check if we can process the URL and Load it
if ($webharvester->takeScreenshot($url)) {
    $image_base_64 = $webharvester->content();  //return a base64 string
}

Setup Options

You can customize the webharvester with some functions:, (*11)

$webharvester = new WebHarvester;

//Custom User Agent
$webharvester->setUserAgent('your user agent');

//Ignore SSL Errors
$webharvester->setIgnoreSSLErrors(true);

//Resource Timeout (in milliseconds)
$webharvester->setResourceTimeout(3000);

//Wait after load (in milliseconds)
$webharvester->setWaitAfterLoad(3000);  // <- useful for get async content

Licence

This project has MIT licence. For more information please read LICENCE file., (*12)

The Versions

27/08 2016

dev-master

9999999-dev

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

27/08 2016

1.2.2

1.2.2.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

31/03 2016

1.2.1

1.2.1.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

02/01 2016

1.2.0

1.2.0.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

02/12 2015

1.1.9

1.1.9.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

02/12 2015

1.1.8

1.1.8.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

02/12 2015

1.1.7

1.1.7.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

01/12 2015

1.1.6

1.1.6.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

16/11 2015

1.1.5

1.1.5.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

15/11 2015

1.1.4

1.1.4.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

15/11 2015

1.1.3

1.1.3.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

15/11 2015

1.1.2

1.1.2.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

15/11 2015

1.1.1

1.1.1.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

12/11 2015

1.1.0

1.1.0.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

11/11 2015

1.0.9

1.0.9.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

11/09 2015

1.0.8

1.0.8.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

11/09 2015

1.0.7

1.0.7.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

21/08 2015

1.0.6

1.0.6.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

21/08 2015

1.0.5

1.0.5.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

21/08 2015

1.0.4

1.0.4.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

20/08 2015

1.0.3

1.0.3.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

20/08 2015

1.0.2

1.0.2.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

20/08 2015

1.0.1

1.0.1.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting

12/08 2015

1.0.0

1.0.0.0

Laravel HTTP Client with Javascript capabilites

  Sources   Download

MIT

The Requires

 

laravel curl javascript http scraping phantomjs harvesting