2017 © Pedro Peláez
 

library html-tokenizer

Will tokenize HTML.

image

kevintweber/html-tokenizer

Will tokenize HTML.

  • Sunday, October 1, 2017
  • by kevintweber
  • Repository
  • 2 Watchers
  • 4 Stars
  • 2,695 Installations
  • PHP
  • 2 Dependents
  • 0 Suggesters
  • 1 Forks
  • 0 Open issues
  • 10 Versions
  • 169 % Grown

The README.md

Html Tokenizer

Latest Version on Packagist ![Software License][ico-license] Build Status ![Coverage Status][ico-scrutinizer] Quality Score ![Total Downloads][ico-downloads], (*1)

This package will tokenize HTML input., (*2)

Some uses of HTML tokens: - Tidy/Minify HTML output - Preprocess HTML - Filter HTML - Sanitize HTML, (*3)

Install

Via Composer, (*4)

``` bash $ composer require kevintweber/html-tokenizer, (*5)


## Usage ``` php parse($htmlDocument); // That was easy ... // Once you have tokens, you can manipulate them. foreach ($tokens as $token) { if ($token->isElement()) { echo $token->getName() . "\n"; } } // Or just output them to an array. $tokenArray = $tokens->toArray(); ``` The following simple HTML: ``` html Test

Whoa!

It parses!
``` will produce the following array: ``` php array( array( 'type' => 'doctype', 'value' => 'html', 'line' => 0, 'position' => 0 ), array( 'type' => 'element', 'name' => 'html', 'line' => 1, 'position' => 0, 'children' => array( array( 'type' => 'element', 'name' => 'head', 'line' => 2, 'position' => 4, 'children' => array( array( 'type' => 'element', 'name' => 'title', 'line' => 3, 'position' => 8, 'children' => array( array( 'type' => 'text', 'value' => 'Test', 'line' => 3, 'position' => 15 ) ) ) ) ), array( 'type' => 'element', 'name' => 'body', 'line' => 5, 'position' => 4, 'children' => array( array( 'type' => 'comment', 'value' => 'Start of content.', 'line' => 6, 'position' => 8 ), array( 'type' => 'element', 'name' => 'h1', 'line' => 7, 'position' => 8, 'attributes' => array( 'id' => 'big_title' ), 'children' => array( array( 'type' => 'text', 'value' => 'Whoa!', 'line' => 7, 'position' => 27 ) ) ), array( 'type' => 'element', 'name' => 'div', 'line' => 8, 'position' => 8, 'attributes' => array( 'class' => 'centered' ), 'children' => array( array( 'type' => 'text', 'value' => 'It ', 'line' => 8, 'position' => 30 ), array( 'type' => 'element', 'name' => 'em', 'line' => 8, 'position' => 33, 'children' => array( array( 'type' => 'text', 'value' => 'parses', 'line' => 8, 'position' => 37 ) ) ), array( 'type' => 'text', 'value' => '!', 'line' => 8, 'position' => 48 ) ) ) ) ) ) ) ) ``` ### Tokens The tokens are of the following types: | Name | Example | |:--------- |:------- | | `cdata` | \ | | `comment` | \ | | `doctype` | \ | | `element` | \Most of your markup will be elements. | | `php` | \ | | `text` | Most of your content will be text. | ### Special parsing situations - Contents of an "iframe" element are not parsed. - Contents of a "script" element are considered TEXT. - Contents of a "style" element are considered TEXT. ### Limitations Currently, this package will tokenize HTML5 and XHTML. It tries to handle errors according to the standard. The tokenizer can handle some (but not all) malformed HTML. You can set the tokenizer to fail silently or throw an exception when it encounters an error. (The default setting is to throw an exception.) If you come across valid HTML this package cannot parse, please submit an issue. ## Change log Please see [CHANGELOG](CHANGELOG.md) for more information what has changed recently. ## Testing ``` bash $ phpunit

Contributing

Please see CONTRIBUTING for details., (*6)

Security

If you discover any security related issues, please email kevintweber@gmail.com instead of using the issue tracker., (*7)

Credits

License

The MIT License (MIT). Please see License File for more information., (*8)

The Versions

01/10 2017

dev-master

9999999-dev https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Kevin Weber

html token kevintweber html-tokenizer

13/03 2017

v0.4

0.4.0.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Kevin Weber

html token kevintweber html-tokenizer

12/03 2017

v0.3.1

0.3.1.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

by Kevin Weber

html token kevintweber html-tokenizer

18/05 2016

v0.3

0.3.0.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

by Kevin Weber

html token kevintweber html-tokenizer

14/05 2016

v0.2.4

0.2.4.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

by Kevin Weber

html token kevintweber html-tokenizer

11/05 2016

v0.2.3

0.2.3.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

by Kevin Weber

html token kevintweber html-tokenizer

08/05 2016

v0.2.2

0.2.2.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

by Kevin Weber

html token kevintweber html-tokenizer

07/05 2016

v0.2.1

0.2.1.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

by Kevin Weber

html token kevintweber html-tokenizer

07/05 2016

v0.2.0

0.2.0.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

 

by Kevin Weber

html token kevintweber html-tokenizer

17/04 2016

v0.1

0.1.0.0 https://github.com/kevintweber/HtmlTokenizer

Will tokenize HTML.

  Sources   Download

MIT

The Requires

  • php >=5.5.9

 

by Kevin Weber

html token kevintweber html-tokenizer