2017 © Pedro Peláez
 

library diavazo

PHP 7 HTML Parser

image

gm314/diavazo

PHP 7 HTML Parser

  • Tuesday, May 16, 2017
  • by gperler
  • Repository
  • 0 Watchers
  • 0 Stars
  • 3 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 3 Versions
  • 0 % Grown

The README.md

Diavazo PHP7 HTML Parser

Diavazo is a wrapper arround \DOMDocument and \DOMElement. It adds some useful functionality to search within descendants or query by classes. The HTMLDocument class allows to either load a string or a file or url. Some basic search methods are available as well., (*1)

For example the method getElement("p .spanClass b.bClass") allows to search for elements, classes and a combination of both. The example will find all <p> elements, all elements with a the class spanClass as well as all <b class="bClass">., (*2)

The result of these searches are an array of HTMLElement objects. These again allow to query, with the difference that searches are only applied to the their direct descendants., (*3)

Installation

composer require gm314/diavazo

Usage

use Diavazo\HTMLDocument;
$document = new HTMLDocument();

// load file
$document->loadFile("local.html");
$document->loadFile("http://mypage.com/test.html");

// load from string
$document->loadString("<html></html>");

HTMLDocument methods

$document = new HTMLDocument();
$document->loadFile(__DIR__ . "/assets/TableToArrayTest.html");

// get element by id
$table = $document->getElementById("associateArrayTest");

// get element by tag name
$elementList = $document->getElementByTagName("div");

// find all 

<

p> elements, all elements with the class 'spanClass' and all <b class="bClass">  
$elementList = $document->getElement("p .spanClass b.bClass");

// xpath query
$title = $document->query("/html/head/title");

// get root (<html>)
$root = $document->getRootElement();

HTMLElement descendants methods

The HTML Element is result of queries like getElementById. Further search methods can be applied on the element. They will search within all descendants., (*4)

The method getDescendantByName("td th") allows to search for several tags., (*5)

$document = new HTMLDocument();
$document->loadFile(__DIR__ . "/assets/TableToArrayTest.html");

$table = $document->getElementById("table");

// will return the first tr (Breadth-first search)
$table->getFirstDescendantByName("tr");

// will return all td and th elements
$tdList = $table->getDescendantByName("td th");

// will find all elements that have the class 'active'
$root = $document->getRootElement();
$elementsWithClass = $root->getDescendantWithClassName("active");

// will find all elements that have the class 'myClass' and are td or th elements
$elementsWithClass = $root->getDescendantWithClassName("myClass", "td th");

// will find all elements having only the class 'testClass'
$elementsWithExactClass = $root->getDescendantWithClassNameStrict("testClass");

// will find all elements having only the class 'testClass' and are td or th elements
$elementsWithExactClass = $root->getDescendantWithClassNameStrict("testClass", "td th");

// find all 

<

p> elements, all elements with the class 'spanClass' and all <b class="bClass"> that are descendants of #myId  
$anyElement = $document-getElementById("myId");
$elementList = $document->getElement("p .spanClass b.bClass");

HTMLElement attribute methods

$document = new HTMLDocument();
$document->loadFile("myFile.html");

$table = $document->getElementBy("myTable");

// will return null if the attribute does not exist otherwise string
$table->getAttributeValue("align");

Table to Array Converter

Diavazo allows converting a table to an associative or index based array. Associative Array will use the first row for the key attribute., (*6)

$document = new HTMLDocument();
$document->loadFile("tabletest.html");

$table = $document->getElementById("myTableID");

$arrayConverter = new TableToArrayConverter($table);
$array = $arrayConverter->getAsAssociativeArray();




...
Key1 Key2
Value 1 Value 2
will result in: $array = [ [ "Key1" => "Value 1", "Key2" => "Value 2" ], ... ]

Table 2 Array using an extractor

The following examples show how to register an extractor. The closure will be invoked with the table data cell (<td>) and is expected to return the value that will be added to the array. The following example gets the first <a> element and extracts the href attribute, (*7)

$document = $this->getDocument();
$table = $document->getElementById("extractorTest");

$arrayConverter = new TableToArrayConverter($table);
$arrayConverter->registerExtractor("columnName", function (HTMLElement $td) {
    $a = $td->getFirstDescendantByName("a");
    return $a->getAttributeValue("href");
});
$array = $arrayConverter->getAsAssociativeArray();

The Versions

16/05 2017

dev-master

9999999-dev

PHP 7 HTML Parser

  Sources   Download

MIT

The Requires

 

The Development Requires

by Gregor Müller

06/05 2017

0.2.1

0.2.1.0

PHP 7 HTML Parser

  Sources   Download

MIT

The Requires

 

The Development Requires

by Gregor Müller

06/05 2017

0.2.0

0.2.0.0

PHP 7 HTML Parser

  Sources   Download

MIT

The Requires

 

The Development Requires

by Gregor Müller