2017 © Pedro Peláez
 

library php-url-extractor

Extract URLs from HTML content.

image

chrisullyott/php-url-extractor

Extract URLs from HTML content.

  • Tuesday, May 1, 2018
  • by chrisullyott
  • Repository
  • 0 Watchers
  • 0 Stars
  • 6 Installations
  • HTML
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 4 Versions
  • 0 % Grown

The README.md

php-url-extractor

Latest Stable Version Total Downloads, (*1)

Extract URLs from HTML content, applying optional filters., (*2)

Installation

With Composer:, (*3)

$ composer require chrisullyott/php-url-extractor

Usage

$html = file_get_contents('about-us.html');

$extractor = new UrlExtractor($html);
$extractor->setHomeUrl('http://www.site.com');
$extractor->setFilesOnly(true);

$urls = $extractor->getUrls();
print_r($urls);
(
    [0] => stdClass Object
        (
            [attribute] => href
            [value] => /_assets/img/icons/favicon-96.png
            [url] => https://www.site.com/_assets/img/icons/favicon-96.png
        )
    ...

Options

setAttributeFilter (array)

The #getUrls method creates a DOMDocument and checks given element attributes, such as src and href, for URLs you might be interested in. Use #setAttributeFilter to override the default set of attributes with your own., (*4)

setHomeUrl (string)

Providing a home URL filters results to those local to the domain. Any relative URL beginning with one slash / and not two slashes is considered local as well. Setting this also builds the url property (an absolute URL) for the objects returned by the #getUrls method., (*5)

setAlternateDomains (array)

Used with #setHomeUrl. If set, the returned URLs will include those whose domain is found in the array. In this array, you may enter strings, like media.site.com and/or regular expressions, like /.*\.site\.com/., (*6)

setFilesOnly (boolean)

Whether we should only return URLs with file extensions., (*7)

setIgnoredExtensions (array)

Used with #setFilesOnly. Excludes URLs whose file extension is found in the array., (*8)

The Versions

01/05 2018

dev-master

9999999-dev https://github.com/chrisullyott/php-url-extractor

Extract URLs from HTML content.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

01/05 2018

v0.2.0

0.2.0.0 https://github.com/chrisullyott/php-url-extractor

Extract URLs from HTML content.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

29/01 2018

v0.1.0

0.1.0.0 https://github.com/chrisullyott/php-url-extractor

Extract URLs from HTML content.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

18/01 2018

v0.0.1

0.0.1.0 https://github.com/chrisullyott/php-url-extractor

Extract URLs from HTML content.

  Sources   Download

MIT

The Requires

  • php >=5.3.0