dev-master
9999999-dev https://github.com/PHPPowertools/HTML5PHP PowerTools HTML5 Component
MIT
The Requires
- php >=5.4.0
by John Slegers
PHP PowerTools HTML5 Component
PHPPowertools is a web application framework for PHP >= 5.4., (*1)
PHPPowertools/HTML5 is the fourth component of the PHPPowertools that has been released to the public., (*2)
This project provides an HTML5 parser for PHP. It originated as a fork of Masterminds/html5-php, which itself started out as a fork of html5lib/html5lib-php., (*3)
namespace App; use \PowerTools\HTML5 as HTML5; // An example HTML document: $html = <<< 'HERE' <html> <head> <title>TEST</title> </head> <body id='foo'> <h1>Hello World</h1> <p>This is a test of the HTML5 parser.</p> </body> </html> HERE; // Parse the document. $dom is a DOMDocument. $html5 = new HTML5(); $dom = $html5->loadHTML($html); // Render it as HTML5: print $html5->saveHTML($dom); // Or save it to a file: $html5->save($dom, 'out.html');
The $dom
created by the parser is a full DOMDocument
object. And the
save()
and saveHTML()
methods will take any DOMDocument., (*4)
It is possible to pass in an array of configuration options when loading an HTML5 document., (*5)
// An associative array of options $options = array( 'option_name' => 'option_value', ); // Provide the options to the constructor $html5 = new HTML5($options); $dom = $html5->loadHTML($html);
The following options are supported:, (*6)
encode_entities
(boolean): Indicates that the serializer should aggressively
encode characters as entities. Without this, it only encodes the bare
minimum.disable_html_ns
(boolean): Prevents the parser from automatically
assigning the HTML5 namespace to the DOM document. This is for
non-namespace aware DOM tools.target_document
(\DOMDocument): A DOM document that will be used as the
destination for the parsed nodes.implicit_namespaces
(array): An assoc array of namespaces that should be
used by the parser. Name is tag prefix, value is NS URI.This library provides the following low-level APIs that you can use to create more customized HTML5 tools:, (*7)
InputStream
abstraction that can work with different kinds of
input source (not just files and strings).The parser is designed as follows:, (*8)
InputStream
portion handles direct I/O.Scanner
handles scanning on behalf of the parser.Tokenizer
requests data off of the scanner, parses it, clasifies
it, and sends it to an EventHandler
. It is a recursive descent parser.
EventHandler
receives notifications and data for each specific
semantic event that occurs during tokenization.DOMBuilder
is an EventHandler
that listens for tokenizing
events and builds a document tree (DOMDocument
) based on the events.The serializer takes a data structure (the DOMDocument
) and transforms
it into a character representation -- an HTML5 document., (*9)
The serializer is broken into three parts:, (*10)
OutputRules
contain the rules to turn DOM elements into strings. The
rules are an implementation of the interface RulesInterface
allowing for
different rule sets to be used. Traverser
, which is a special-purpose tree walker. It visits
each node node in the tree and uses the OutputRules
to transform the node
into a string.HTML5
manages the Traverser
and stores the resultant data
in the correct place.The serializer (save()
, saveHTML()
) follows the
section 8.9 of the HTML 5.0 spec.
So tags are serialized according to these rules:, (*11)
:
has no special
meaning.
By default the parser does not support XML style namespaces via :
;
to enable the XML namespaces see the XML Namespaces section
To use XML style namespaces you have to configure well the main HTML5
instance., (*12)
use Masterminds\HTML5; $html = new HTML5(array( "xmlNamespaces" => true )); $dom = $html->loadHTML('<t:tag xmlns:t="http://www.example.com"/>'); $dom->documentElement->namespaceURI; // http://www.example.com
You can also add some default prefixes that will not require the namespace declaration, but it's elements will be namespaced., (*13)
use Masterminds\HTML5; $html = new HTML5(array( "implicitNamespaces"=>array( "t"=>"http://www.example.com" ) )); $dom = $html->loadHTML('<t:tag/>'); $dom->documentElement->namespaceURI; // http://www.example.com
A huge debt of gratitude to the original authors of Masterminds/html5-php and html5lib/html5lib-php. Their names are credited in the code of the classes I borrowed from Masterminds/html5-php., (*14)
John slegers |
PHP PowerTools HTML5 Component
MIT