PHP Fast Simple HTML DOM Parser
, (*1)
PHP Fast Simple HTML DOM Parser - fast and low mamory usage HTML DOM Parser with syntax like PHP Simple HTML DOM Parser, (*2)
Установка
Для установки выполните команду:, (*3)
composer require dimabdc/php-fast-simple-html-dom-parser
Быстрый старт
require_once "vendor/autoload.php";
use FastSimpleHTMLDom\Document;
// Create DOM from URL
$html = Document::file_get_html('https://habrahabr.ru/interesting/');
// Find all post blocks
$post = [];
foreach($html->find('div.post') as $post) {
$item['title'] = $post->find('h1.title', 0)->plaintext;
$item['hubs'] = $post->find('div.hubs', 0)->plaintext;
$item['content'] = $post->find('div.content', 0)->plaintext;
$post[] = $item;
}
print_r($post);
Как создать HTML DOM объект
// Create a DOM object from a string
$html = new Document('<html><body>Hello!</body></html>');
// Create a DOM object from a string
$html = new Document();
$html->loadHtml('<html><body>Hello!</body></html>');
// Create a DOM object from a HTML file
$html = new Document();
$html->loadHtmlFile('test.htm');
// Create a DOM object from a URL
$html = new Document(file_get_contents('https://habrahabr.ru/interesting/'));
Как искать HTML DOM элементы?
Основа
// Find all anchors, returns a array of element objects
$ret = $html->find('a');
// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', 0);
// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', -1);
// Find all
<
div> with the id attribute
$ret = $html->find('div[id]');
// Find all
<
div> which attribute id=foo
$ret = $html->find('div[id=foo]');
Часто используемое
// Find all element which id=foo
$ret = $html->find('#foo');
// Find all element which class=foo
$ret = $html->find('.foo');
// Find all element has attribute id
$ret = $html->find('*[id]');
// Find all anchors and images
$ret = $html->find('a, img');
// Find all anchors and images with the "title" attribute
$ret = $html->find('a[title], img[title]');
Слекторы потомков
// Find all <li> in
<
ul>
$es = $html->find('ul li');
// Find Nested
<
div> tags
$es = $html->find('div div div');
// Find all <td> in
<
table> which class=hello
$es = $html->find('table.hello td');
// Find all td tags with attribite align=center in table tags
$es = $html->find('table td[align=center]');
Вложенные селекторы
// Find all <li> in
<
ul>
foreach($html->find('ul') as $ul)
{
foreach($ul->find('li') as $li)
{
// do something...
}
}
// Find first <li> in first
<
ul>
$e = $html->find('ul', 0)->find('li', 0);
Фильтр атрибутов
Filter |
Description |
[attribute] |
Matches elements that have the specified attribute. |
[!attribute] |
Matches elements that don't have the specified attribute. |
[attribute=value] |
Matches elements that have the specified attribute with a certain value. |
[attribute!=value] |
Matches elements that don't have the specified attribute with a certain value. |
[attribute^=value] |
Matches elements that have the specified attribute and it starts with a certain value. |
[attribute$=value] |
Matches elements that have the specified attribute and it ends with a certain value. |
[attribute*=value] |
Matches elements that have the specified attribute and it contains a certain value. |
Текст, комментарии
// Find all text blocks
$es = $html->find('text');
// Find all comment () blocks
$es = $html->find('comment');
Доступ к атрибутам
Получение, установка и удаление атрибутов
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute(If the attribute is non-value attribute (eg. checked, selected...), set it's value as true or false)
$e->href = 'my link';
// Remove a attribute, set it's value as null!
$e->href = null;
// Determine whether a attribute exist?
if(isset($e->href))
echo 'href exist!';
"Магические" атрибуты
// Example
$html = str_get_html('
foo bar
');
$e = $html->find('div', 0);
echo $e->tag; // Returns: "div"
echo $e->outertext; // Returns: "
foo bar
"
echo $e->innertext; // Returns: "foo <b>bar</b>"
echo $e->plaintext; // Returns: "foo bar"
Attribute Name |
Usage |
$e->tag |
Read or write the tag name of element. |
$e->outertext |
Read or write the outer HTML text of element. |
$e->innertext |
Read or write the inner HTML text of element. |
$e->plaintext |
Read or write the plain text of element. |
Трюки
// Extract contents from HTML
echo $html->plaintext;
// Wrap a element
$e->outertext = '
<
div class="wrap">' . $e->outertext . '
<
div>';
// Remove a element, set it's outertext as an empty string
$e->outertext = '';
// Append a element
$e->outertext = $e->outertext . '
<
div>foo
<
div>';
// Insert a element
$e->outertext = '
<
div>foo
<
div>' . $e->outertext;
Прогон по DOM-дереву
// If you are not so familiar with HTML DOM, check this link to learn more...
// Example
echo $html->find('#div1', 0)->children(1)->children(1)->children(2)->id;
// or
echo $html->getElementById('div1')->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');
Method |
Description |
mixed $e->children([int $index]) |
Returns the Nth child object if index is set, otherwise return an array of children. |
Element $e->parent() |
Returns the parent of element. |
Element $e->first_child() |
Returns the first child of element, or null if not found. |
Element $e->last_child() |
Returns the last child of element, or null if not found. |
Element $e->next_sibling() |
Returns the next sibling of element, or null if not found. |
Element $e->prev_sibling() |
Returns the previous sibling of element, or null if not found. |
API-справочник
Методы и свойства DOM
Name |
Description |
void __construct([string |
Element $html]) |Constructor $html is text or Element. |
string plaintext |
Returns the contents extracted from HTML. |
mixed find (string $selector [, int $index]) |
Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object. |
Методы и свойства элементов
Name |
Description |
string [attribute] |
Read or write element's attribure value. |
string tag |
Read or write the tag name of element. |
string outertext |
Read or write the outer HTML text of element. |
string innertext |
Read or write the inner HTML text of element. |
string plaintext |
Read or write the plain text of element. |
mixed find (string $selector [, int $index]) |
Find children by the CSS selector. Returns the Nth element object if index is set, otherwise, return an array of object. |
Прогон по дереву DOM
Name |
Description |
mixed $e->children([int $index]) |
Returns the Nth child object if index is set, otherwise return an array of children. |
element $e->parent() |
Returns the parent of element. |
element $e->first_child() |
Returns the first child of element, or null if not found. |
element $e->last_child() |
Returns the last child of element, or null if not found. |
element $e->next_sibling() |
Returns the next sibling of element, or null if not found. |
element $e->prev_sibling() |
Returns the previous sibling of element, or null if not found. |
camelCase эквиваленты
string $e->getAttribute($name)
string $e->attribute
void $e->setAttribute($name, $value)
void $value = $e->attribute
bool $e->hasAttribute($name)
bool isset($e->attribute)
void $e->removeAttribute($name)
void $e->attribute = null
element $e->getElementById($id)
mixed $e->find("#$id", 0)
mixed $e->getElementsById($id [,$index])
mixed $e->find("#$id" [, int $index])
element $e->getElementByTagName($name)
mixed $e->find($name, 0)
mixed $e->getElementsByTagName($name [, $index])
mixed $e->find($name [, int $index])
element $e->parentNode()
element $e->parent()
mixed $e->childNodes([$index])
mixed $e->children([int $index])
element $e->firstChild()
element $e->first_child()
element $e->lastChild()
element $e->last_child()
element $e->nextSibling()
element $e->next_sibling()
element $e->previousSibling()
element $e->prev_sibling()