2017 © Pedro Peláez
 

library htmlcleaner

A HTML cleaner based on SimpleXML, fast and customizable

image

voilab/htmlcleaner

A HTML cleaner based on SimpleXML, fast and customizable

  • Monday, May 22, 2017
  • by tafel
  • Repository
  • 3 Watchers
  • 1 Stars
  • 3 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 7 Versions
  • 0 % Grown

The README.md

Voilab HTML cleaner

A HTML cleaner based on SimpleXML, fast and customizable, (*1)

Install

Via Composer, (*2)

Create a composer.json file in your project root: ``` json { "require": { "voilab/htmlcleaner": "0.*" } }, (*3)


``` bash $ composer require voilab/htmlcleaner

Sample dataset

``` html , (*4)

Some paragraph with bold or nested tags. , (*5)

And a second paragraph (so two roots elements, here) with a cool link, a bad link and some nice attributes to try to keep. , (*6)


## Basic usage ### All tags stripped ``` php use \voilab\cleaner\HtmlCleaner; $cleaner = new HtmlCleaner(); $raw_html = '...'; // take sample dataset above echo $cleaner->clean($raw_html);

Allow some tags

``` php // create cleaner... $cleaner->addAllowedTags(['p', 'strong']); // call clean method, (*7)


### Allow some tags and attributes (regardless of tags) ``` php // create cleaner... $cleaner ->addAllowedTags(['p', 'span']) ->addAllowedAttributes(['class']); // call clean method

Allow some attributes only on certain tags

``` php // create cleaner... $cleaner ->addAllowedTags(['p', 'span']) ->addAllowedAttributes([ // keep attribute "class" only for spans new \voilab\cleaner\attribute\Keep('class', 'span'),, (*8)

    // you can use this shorthand too, as a string
    'style:span'
]);

// call clean method, (*9)


## Advanced usage ### Processors Processors are used to prepare HTML string before it is inserted into a new SimpleXMLElement (base of the process). They are also used to format the HTML after it is cleaned. It's some sort of pre-process and post-process. > The pre-process **must** remove not allowed tags. #### Standard processor The standard processor uses `strip_tags()` to remove not allowed tags. After process, the processor removes all carriage returns from the string. #### Custom processor You can create your own processor by implementing `\voilab\cleaner\processor\Processor`. Do not forget that the pre-process is responsible of removing all not allowed tags. ### Attributes Attributes classes are used to validate attributes and their content. By default an allowed attribute becomes a `\voilab\cleaner\attribute\Keep`. Every "not allowed" attribute becomes a `\voilab\cleaner\attribute\Remove`. These two attribute types don't need to be instanciated by you. All attributes provided as a string in `setAllowedTags()` are converted in `Keep` class. #### Js attribute You may want to keep some attributes but check the content. It's true for the `href` attribute. It can contain a valid URL or some javascript injection. There is an attribute validator already created for that: ``` php $cleaner ->addAllowedTags(['a']) ->addAllowedAttributes([ new \voilab\cleaner\attribute\Js('href') ]);

Note that allowed attributes can be bound or not to a specific tag. In the example above, the href attribute will be valid for every HTML tag. If you want to bind the attribute to a tag, you need to specify it as a second parameter., (*10)

Known limitations

Root mixed content

Mixed content outside tags is not allowed in root position., (*11)

``` html some root mixed special content, (*12)

some root mixed special content, (*13)

some root element, (*14)

and an other root element, (*15)


### Bad HTML format with Standard processor If HTML is not well formatted, the cleaner will throw an `\Exception`. The string needs to be perfectly written, because it is processed by `simplexml_load_string($html)`, which is very strict: - tags must be closed (`<p></p>` or `<br />`) - attributes must be wrapped in (double-)quotes (`<hr class="test" />`) - (double-)quote is not allowed in attribute content, it must be converted in `&quot;` before `HtmlCleaner::clean()` is called - opening tag `<` and `&` are not allowed in content, they must be converted respectivly in `&lt;` and `&amp;` before `HtmlCleaner::clean()` is called These limitations will eventually be addressed in future releases. ## Testing ``` bash $ vendor/bin/phpunit --bootstrap vendor/autoload.php tests/

Security

If you discover any security related issues, please use the issue tracker., (*16)

Credits

License

The MIT License (MIT). Please see License File for more information., (*17)

The Versions

22/05 2017

dev-develop

dev-develop http://www.voilab.ch

A HTML cleaner based on SimpleXML, fast and customizable

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

html cleaner voilab

22/05 2017

dev-master

9999999-dev http://www.voilab.ch

A HTML cleaner based on SimpleXML, fast and customizable

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

html cleaner voilab

22/05 2017

0.2.0

0.2.0.0 http://www.voilab.ch

A HTML cleaner based on SimpleXML, fast and customizable

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

html cleaner voilab

05/04 2017

0.1.3

0.1.3.0 http://www.voilab.ch

A HTML cleaner based on SimpleXML, fast and customizable

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

html cleaner voilab

05/04 2017

0.1.2

0.1.2.0 http://www.voilab.ch

A HTML cleaner based on SimpleXML, fast and customizable

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

html cleaner voilab

03/04 2017

0.1.1

0.1.1.0 http://www.voilab.ch

A HTML cleaner based on SimpleXML, fast and customizable

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

html cleaner voilab

31/03 2017

0.1.0

0.1.0.0 http://www.voilab.ch

A HTML cleaner based on SimpleXML, fast and customizable

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

html cleaner voilab