2017 © Pedro Peláez
 

library imgscrape

Simple image scraper from remote URL to get the largest image

image

boris/imgscrape

Simple image scraper from remote URL to get the largest image

  • Tuesday, February 24, 2015
  • by borispavlov0
  • Repository
  • 2 Watchers
  • 3 Stars
  • 25 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 1 Forks
  • 0 Open issues
  • 13 Versions
  • 0 % Grown

The README.md

Installation

Install via composer:, (*1)

require: {
    "boris/imgscrape": "0.*"
}

Usage

Please check the index.php file supplied for a working example., (*2)

In essence you initialize the scraper by doing:, (*3)

$scraper = new Boris\ImgScraper\Scraper($client, $logger, $configArray);

Here the $client object is a Guzzle client instance and the $logger is located in the same namespace as the scraper., (*4)

To get the source of the largest image on any URL:, (*5)

$scraper->getLargestImageUrl($url);

The script issues a head request first. If the 'imageLinksOnly' parameter is set to true, if the response does not contain a 'Content-Type' header or if that header is not of an image type, it returns null. Otherwise, it just returns the same URL (this functionality is useful if you have a huge array of URLs and you want to get only the direct image URLs)., (*6)

Symfony

To use this component in Symfony, please register it as a service:, (*7)

parameters:  
  boris.scraper: ~
  boris.logger: ~
  guzzle.params:
    base_url: http://www.reddit.com


services:
    boris.logger:
      class: Boris\ImgScrape\Logger
      arguments: [%boris.logger%]

    boris.imgscrape:
      class: Boris\ImgScrape\Scraper
      arguments: [@guzzle.client, @boris.logger, %boris.scraper%]

    guzzle.client:
      class: GuzzleHttp\Client
      arguments: [%guzzle.params%]

You can then call this from the container:, (*8)

$this->container->get('boris.imgscrape');

Parameter Reference

There is a default set of parameters that can be overridden when initializing the scraper and logger combo:, (*9)

$config = [
    'imageLinksOnly' => false,
    'acceptedTypes' => [
        'jpeg',
        'jpg',
        'gif',
        'png',
    ],
    'blacklist' => [
        'www.reddit.com'
    ],
    'user-agent' => 'Boris-ImgScrape/0.2 (amateur script, contact: my at email dot com)'
];

$configLogger = [
    'enabled' => true,
    'handlers' => [
        [
            'dir' => __DIR__ . '/../../../../log/debug.log',
            'level' => 'debug'
        ],
        [
            'dir' => __DIR__ . '/../../../../log/main.log',
            'level' => 'info'
        ],
    ]
];

These can be used as your %scraper% parameters value and you only need to override what you need. Here is a reference on what each parameter means:, (*10)

scraper:
    imageLinksOnly: only returns the URL if the supplied URL is for and image
    acceptedTypes: accepted image mime types
    blacklist: which hostnames to ignore
    user-agent: your useragent string

logger:
    enabled: whether or not to enable the logger
    handlers: an array for each logger handler. Supply the dir and the level of the logger (this component uses Monolog, so you can check the default documentation for levels)

Tests

In order for tests to run, you need to include the following dependencies in your project for composer to install:, (*11)

require-dev: {
    "mockery/mockery": "0.9.*@dev",
    "phpunit/phpunit": "4.7.*@dev"
}

To run tests, navigate to the root directory of the project and run:, (*12)

phpunit --group=BorisImgScrape

Logs

By default, Monolog creates a log file with the level specified in the 'handlers' parameter of the logger config. You can use DEBUG, but keep in mind the logs get quite big., (*13)

The Versions

24/02 2015

dev-master

9999999-dev

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

24/02 2015

dev-develop

dev-develop

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

24/02 2015

0.3.4

0.3.4.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

24/02 2015

0.3.3

0.3.3.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

12/02 2015

dev-hotfix/0.3.2

dev-hotfix/0.3.2

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

12/02 2015

0.3.2

0.3.2.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

12/02 2015

0.3.1

0.3.1.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

12/02 2015

0.3

0.3.0.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

The Development Requires

by Boris Pavlov

07/02 2015

0.2.1

0.2.1.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

by Boris Pavlov

06/02 2015

0.1.4

0.1.4.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

by Boris Pavlov

06/02 2015

0.2

0.2.0.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

by Boris Pavlov

04/02 2015

0.1.3

0.1.3.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

MIT

The Requires

 

by Boris Pavlov

04/02 2015

0.1.2

0.1.2.0

Simple image scraper from remote URL to get the largest image

  Sources   Download

The Requires

 

by Boris Pavlov