Berlioz HTML Selector
, (*1)
Berlioz HTML Selector is a PHP library to do queries on HTML files with CSS selectors like jQuery on DOM., (*2)
Installation
Composer
You can install Berlioz HTML Selector with Composer, it's the recommended installation., (*3)
$ composer require berlioz/html-selector
Dependencies
-
PHP ^8.0
- PHP libraries:
- dom
- libxml
- mbstring
- simplexml
Usage
Load HTML
You can easily load an HTML string or file with the static function HtmlSelector::query()
. For files, use second
parameter contentsIsFile
of method., (*4)
$htmlSelector = new \Berlioz\HtmlSelector\HtmlSelector();
$query = $htmlSelector->query('<html><body>...</body></html>');
$query = $htmlSelector->query('path-of-my-file/file.html', contentsIsFile: true);
$query = $htmlSelector->query(new SimpleXMLElement(/*...*/));
Load from ResponseInterface
HtmlSelector::queryFromResponse()
permit loading html of a response body., (*5)
$htmlSelector = new \Berlioz\HtmlSelector\HtmlSelector();
/** @var \Psr\Http\Message\ResponseInterface $response */
$query = $htmlSelector->queryFromResponse($response);
Do a query
It's very simple to query an HTML string with a selector like jQuery., (*6)
/** @var \Berlioz\HtmlSelector\Query\Query $query */
$query = $query->find('body > .wrapper h2');
$query = $query->filter(':first');
Selectors
CSS Simple selectors
-
type: selection of elements with their type.
-
#id: selection of an element with it's ID.
-
.class: selection of elements with their class.
- Attributes selections.
-
[attribute]: with attribute 'attribute'.
-
[attribute=foo]: value of attribute equals to 'foo'.
-
[attribute^=foo]: value of attribute starts with 'foo'.
-
[attribute$=foo]: value of attribute ends with 'foo'.
-
[attribute*=foo]: value of attribute contains 'foo'.
-
[attribute!=foo]: value of attribute different of 'foo'.
-
[attribute~=foo]: value of attribute contains word 'foo'.
-
[attribute|=foo]: value of attribute contains prefix 'foo'.
CSS Ascendants, descendants, multiples
-
selector selector or **selector >> selector: all descendant selector.
-
selector > selector: direct descendant selector (only children).
-
selector ~ selector: siblings selector.
-
selector, selector: multiple selectors.
CSS Pseudo Classes
-
:any(selector, selector): only elements given in arguments.
-
:any-link: only elements of type
<a>
, <area>
and <link>
, with [href]
attribute.
-
:blank: only elements without child, and no text (except spaces).
-
:checked: only elements with attribute
[checked]
.
-
:dir: only elements with directional text given (default: ltr).
-
:disabled: only elements of type
<button>
, <input>
, <optgroup>
, <select>
or <textarea>
with [disabled]
attribute.
-
:empty: only elements without child.
-
:enabled: only elements of type
<button>
, <input>
, <optgroup>
, <option>
, <select>
, <textarea>
, <menuitem>
or <fieldset>
without [disabled]
attribute.
-
:first: only first result of complete selection.
-
:first-child: only firsts children in their parents.
-
:first-of-type: only firsts type in their parents.
-
:has(selector, selector): only elements who valid child selector.
-
:lang(x): only elements with attribute
[lang]
prefixed by or equals to given value.
-
:last-child: only lasts in their parents.
-
:last-of-type: only lasts type in their parents.
-
:not(selector, selector): filter 'not'.
-
:nth-child(): n elements in selector result.
-
:nth-last-child(): n elements in selector result, start at end of list.
-
:nth-of-type(): n elements of given type in selector result.
-
:nth-last-of-type(): n elements of given type in selector result, start at end of list.
-
:only-child: only elements who are only child in the parent.
-
:only-of-type: only elements who are only type child in the parent.
-
:optional(): only input elements without
[required]
attribute.
-
:read-only(): only elements that the user cannot edit.
-
:read-write(): only elements with editable property.
-
:required(): only elements with
[required]
attribute.
-
:root(): get root element.
Additional CSS Pseudo Classes (not in CSS specifications) from jQuery library
-
:button: only elements of type
<button>
without attribute value [type=submit]
or <input type="button">
.
-
:checkbox: only elements with attribute
[type=checkbox]
.
-
:contains(x): only elements who contain text given.
-
:eq(x): only result with index given (index start to 0).
-
:even: only even results in selection.
-
:file: only elements with attribute
[type=file]
.
-
:gt(x): only result with an index greater than index given (index start to 0).
-
:gte: only result with an index greater than or equal to index given (index start to 0).
-
:header: only elements of heading, like
<h1>
, <h2>
...
-
:image: only elements with attribute
[type=image]
.
-
:input: only elements of type
<input>
, <textarea>
, <select>
or <button>
.
-
:last: only last result of complete selection.
-
:lt: only result with index leather than index given (index start to 0).
-
:lte: only result with index leather than or equal to index given (index start to 0).
-
:odd: only odd results in selection.
-
:parent: only elements with one child or more.
-
:password: only elements with attribute
[type=password]
.
-
:radio: only elements with attribute
[type=radio]
.
-
:reset: only elements with attribute
[type=reset]
.
-
:selected: only elements of type
<option>
with attribute [selected]
.
-
:submit: only elements of type
<button>
or <input>
with attribute [type=submit]
.
-
:text: only elements of type
<input>
with attribute [type=text]
or without [type]
attribute.
Additional CSS Pseudo Classes (not in CSS specifications)
-
:count(x): only elements who are x children in the parent, used in :has(selector) pseudo class.
Full example of selectors
select > option:selected
div#myId.class1.class2[name1=value1][name2=value2]:even:first
Functions
Default functions
Some default functions are available in Query object to interact with results. The functions should have the same result
as their counterparts on jQuery., (*7)
-
attr(name): get attribute value
-
attr(name, value): set attribute value
-
children(): get children of elements in result.
-
count(): count the number of elements in query result.
-
data(nameOfData): get data value (name is with camelCase syntax without the 'data-' prefix).
-
filter(selector): filter elements in result.
-
find(selector): find selector in elements in result.
-
get(i): get DOM element in result.
-
hasClass(class_name): know if least one of element in result have given classes.
-
html(): get html of first element in result.
-
index(selector): get the index of given selector in result elements.
-
is(selector): know if selector valid the least one element in result.
-
isset(i): return boolean to know if an element key exists in result.
-
next(selector): get next element after each element in result.
-
nextAll(selector): get all next elements after each element in result.
-
not(selector): filter elements in result.
-
parent(): get direct parent of current result of selecting.
-
parents(selector): get all parents of current result of selecting.
-
prev(selector): get prev element after each element in result.
-
prevAll(selector): get all prev elements after each element in result.
-
prop(name): get property boolean value of an attribute, used for example for
disabled
attribute.
-
prop(name, value): set property boolean value of an attribute, used for example for
disabled
attribute.
-
serialize(): serialize input values of a form. Return a string.
-
serializeArray(): serialize input values of a form. Return an array.
-
text(): get text of each element concatenated.
-
val(): get value of a form element.