Wallogit.com
2017 © Pedro Peláez
HTML parser and DOM implementation
This is an HTML parser with a minimal DOM implementation. It doesn't depend on PHP's bundled libxml and DOM and handles some of the broken markup encountered in the wild., (*1)
<?php
use \gaswelder\htmlparser\Parser;
use \gaswelder\htmlparser\ParsingException;
try {
$doc = Parser::parse($html);
} catch(ParsingException $e) {
// ...
return;
}
$images = $doc->querySelectorAll('#posts .post img');
foreach ($images as $img) {
$src = $img->getAttribute('src');
echo $src, "\n";
}
All container nodes (DocumentNode and ElementNode) have the querySelector and
querySelectorAll methods which support a limited subset of CSS2:, (*2)
div)[checked], [attr="val"], [attr$="val"], [attr^="val"]).active)#main)Also they support these combinators:, (*3)
ul li)ul > li)li + li)The nodes can be printed to the console, and the output will be similar to Firefox's console:, (*4)
<?php
$list = $doc->getElementsByTagName('a');
echo $list, PHP_EOL;
might produce this output:, (*5)
NodeList [ <a>, <a#top>, <a.first>, <a>, <a> ]
Composer dudes do this in the console:, (*6)
composer require gaswelder/htmlparser
Old-school dudes (if still alive) may download the library to whatever \$libdir they have and do this:, (*7)
require "$libdir/htmlparser/init.php";