2017 © Pedro Peláez
 

library grawler

A guided html crawler with media meta extraction

image

sleimanx2/grawler

A guided html crawler with media meta extraction

  • Wednesday, June 14, 2017
  • by sleimanx2
  • Repository
  • 2 Watchers
  • 11 Stars
  • 273 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 2 Forks
  • 0 Open issues
  • 11 Versions
  • 1 % Grown

The README.md

Grawler

Software License Build Status, (*1)

Install

Via Composer, (*2)

``` bash $ composer require sleimanx2/grawler, (*3)


## Basic Usage ##### getting the page dom ```php require_once('vendor/autoload.php'); $client = new Bowtie\Grawler\Client(); $grawler = $client->download('http://example.com');
finding basic attributes
$grawler->title();
// provide a css path to find the attribute
$grawler->body($path = '.main-content');
// extracts meta keywords (array)
$grawler->keywords();
// extracts meta description 
$grawler->description();
finding media
$grawler->images('.content img');
$grawler->videos('iframe');
$grawler->audio('.audio iframe');

Resolving media attributes

In order resolve media attributes you need to load providers's configuration, (*4)

videos

Current video resolvers (youtube , vimeo), (*5)

// resolve all videos at once 
$videos = $grawler->videos('iframe')->resolve();

then you can access videos attributes as follow, (*6)

foreach($videos as $video)
{
  $video->id; // the video provider id
  $video->title;
  $video->description;
  $video->url;
  $video->embedUrl;
  $video->images; // Collection of Image instances
  $video->author;
  $video->authorId;
  $video->duration;
  $video->provider; //video source
}

you can also resolve videos individually as follow, (*7)

$videos = $grawler->videos('iframe')

foreach($videos as $video)
{
  $video->resolve();
  $video->title;
  //...
}

audio

Current video resolvers (soundcloud), (*8)

// resolve all audio at once 
$audio = $grawler->audio('.audio iframe')->resolve();

then you can access videos attributes as follow, (*9)

foreach($audio as $track)
{
  $track->id; // the video provider id
  $track->title;
  $track->description;
  $track->url;
  $track->embedUrl;
  $track->images; // Collection of cover photo instances
  $track->author;
  $track->authorId;
  $track->duration;
  $track->provider; //video source
}

you can also resolve audio individually as follow, (*10)

$track = $grawler->track('.audio iframe')

foreach($audio as $track)
{
  $track->resolve();
  $track->title;
  //...
}

Resolving page urls

$links = $grawler->links('.main thumb a')

foreach($links as $link)
{
  print $link
  //or
  print $link->uri
  //or
  print $link->getUri()
}

Configuration

Client Config

Set user agent
$client->agent('Googlebot/2.1')->download('http://example.com');

Recomended : http://webmasters.stackexchange.com/questions/6205/what-user-agent-should-i-set, (*11)

Set request auth
$client->auth('me', '**')

you can change the auth type as follow, (*12)

$client->auth('me', '**', $type = 'basic');
Set request method
$client->method('post');

Grawler config

By default the grawler tries to access those environment variables, (*13)

GRAWLER_YOUTUBE_KEY

GRAWLER_VIMEO_KEY
GRAWLER_VIMEO_SECRET

GRAWLER_SOUNDCLOUD_KEY
GRAWLER_SOUNDCLOUD_SECRET

if you don't use env vars you can load configuration as follow., (*14)

$config = [
  'youtubeKey'   =>'',
  'soundcloudKey'=>''

  'vimeoKey'    => '',
  'vimeoSecret' => '',

  'soundcloudKey'    => '',
  'soundcloudSecret' => '',
];

$grawler->loadConfig($config);

Testing

``` bash $ phpunit --testsuite unit, (*15)


``` bash $ phpunit --testsuite integration

NB: you should set your ptoviders key (youtube,vimeo,soundcloud...) to run integration tests, (*16)

Contributing

Please see CONTRIBUTING, (*17)

Security

If you discover any security related issues, please email sleiman@bowtie.land instead of using the issue tracker., (*18)

License

The MIT License (MIT). Please see License File for more information., (*19)

The Versions