The LinkExtractor tries to figure out what resources an HTML document links
to., (*1)
This tries to solve 2 questions:, (*2)
- what constitutes a link between an HTML document and another resource, and
- how to supply them in a useful (resolved) format.
The class is able to get (and resolve) every URL from a pre-parsed DOM in
accordance with where the HTML5 specification allows URLs to be defined. The
HTML5 specification is followed after a GitHub discussion
about what a link is, and further in person discussion at IWC Berlin
2017., (*3)
Using versions < 1.0
The current function prototypes are the same as those that will be in 1.0.0.
This means you can write the method calls in your code right now and update
to 1.0.0 without breaking any of your calls later., (*4)
function extract(): array
function linksTo(string $url): bool
If you rely on specific output or exceptions from these methods however,
there might be breaking changes before version 1.0.0. Take a look at the
roadmap to version 1.0.0 to get an idea of what is likely to
still be changed., (*5)
Don’t wait for 1.0.0 with using this library if you need it! Instead, start
using it and leave the ever-important feedback., (*6)
Install
Via Composer, (*7)
``` bash
$ composer require zegnat/linkextractor, (*8)
## Usage
``` php
// Parse an HTML file into a DOMDocument somehow, e.g.:
$dom = new \Zegnat\Html\DOMDocument();
$dom->loadHTML(file_get_contents('http://example.com/index.html'));
// Now initiate the extractor:
$extractor = new \Zegnat\LinkExtractor\LinkExtractor($dom, 'http://example.com/index.html');
var_dump(
$extractor->linksTo('https://github.com/'),
$extractor->linksTo('http://www.iana.org/domains/example')
);
/*
bool(false)
bool(true)
*/
License
The BSD Zero Clause License (0BSD). Please see the LICENSE file for more
information., (*9)