zegnat/linkextractor

A package that tries to figure out what resources an HTML document links to.

Sunday, November 19, 2017
by Zegnat
Repository
3 Watchers
2 Stars
1 Installations

PHP
0 Dependents
0 Suggesters
2 Forks
7 Open issues
3 Versions
0 % Grown

LinkExtractor

The LinkExtractor tries to figure out what resources an HTML document links to., _(*1)

This tries to solve 2 questions:, _(*2)

what constitutes a link between an HTML document and another resource, and
how to supply them in a useful (resolved) format.

The class is able to get (and resolve) every URL from a pre-parsed DOM in accordance with where the HTML5 specification allows URLs to be defined. The HTML5 specification is followed after a GitHub discussion about what a link is, and further in person discussion at IWC Berlin 2017., _(*3)

Using versions < 1.0

The current function prototypes are the same as those that will be in 1.0.0. This means you can write the method calls in your code right now and update to 1.0.0 without breaking any of your calls later., _(*4)

function extract(): array
function linksTo(string $url): bool

If you rely on specific output or exceptions from these methods however, there might be breaking changes before version 1.0.0. Take a look at the roadmap to version 1.0.0 to get an idea of what is likely to still be changed., _(*5)

Don’t wait for 1.0.0 with using this library if you need it! Instead, start using it and leave the ever-important feedback., _(*6)

Install

Via Composer, _(*7)

``` bash $ composer require zegnat/linkextractor, _(*8)


## Usage

``` php
// Parse an HTML file into a DOMDocument somehow, e.g.:
$dom = new \Zegnat\Html\DOMDocument();
$dom->loadHTML(file_get_contents('http://example.com/index.html'));
// Now initiate the extractor:
$extractor = new \Zegnat\LinkExtractor\LinkExtractor($dom, 'http://example.com/index.html');
var_dump(
    $extractor->linksTo('https://github.com/'),
    $extractor->linksTo('http://www.iana.org/domains/example')
);
/*
bool(false)
bool(true)
*/

License

The BSD Zero Clause License (0BSD). Please see the LICENSE file for more information., _(*9)

19/11 2017

dev-develop

A package that tries to figure out what resources an HTML document links to.

Sources Download

0BSD

The Requires

league/uri ^5.0

The Development Requires

by Martijn van der Ven

14/11 2017

dev-master

9999999-dev

A package that tries to figure out what resources an HTML document links to.

Sources Download

0BSD

The Requires

league/uri ^5.0

by Martijn van der Ven

14/11 2017

0.1.0

0.1.0.0

A package that tries to figure out what resources an HTML document links to.

Sources Download

0BSD

The Requires

league/uri ^5.0

library linkextractor

A package that tries to figure out what resources an HTML document links to.

zegnat/linkextractor

The README.md

LinkExtractor

Using versions < 1.0

Install

License

The Versions

dev-develop

The Requires

The Development Requires

by Martijn van der Ven

dev-master

The Requires

by Martijn van der Ven

0.1.0

The Requires

by Martijn van der Ven