2017 © Pedro Peláez
 

library pharse

Fastest PHP HTML Parser

image

ressio/pharse

Fastest PHP HTML Parser

  • Sunday, February 11, 2018
  • by dryabov
  • Repository
  • 8 Watchers
  • 51 Stars
  • 1,741 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 9 Forks
  • 7 Open issues
  • 3 Versions
  • 127 % Grown

The README.md

Pharse

Fastest PHP HTML Parser, (*1)


The Pharse is fork of Ganon library and gives access to HTML/XML documents in a very simple object oriented way. It eases modifying the DOM and makes finding elements easy with CSS3-like queries., (*2)

Pharse is:, (*3)

  • A universal tokenizer
  • A HTML/XML/RSS DOM Parser
    • Ability to manipulate elements and their attributes
    • Supports HTML5
    • Supports invalid HTML
    • Supports UTF8
    • Can perform advanced CSS3-like queries on elements (like jQuery -- namespaces supported)
  • A HTML beautifier (like HTML Tidy)
    • Minify CSS and Javascript
    • Sort attributes, change character case, correct indentation, etc.
  • Extensible
    • Parsing documents using callbacks based on current character/token
    • Operations separated in smaller functions for easy overriding
  • Fast
  • Easy

Quick start

include('path/pharse.php');

// Parse the google code website into a DOM
$html = Pharse::file_get_dom('http://code.google.com/');

After including Pharse and loading the DOM, it is time to get started., (*4)

Access

Accessing elements is made easy through the CSS3-like selectors and the object model., (*5)

// Find all the paragraph tags with a class attribute and print the
// value of the class attribute
foreach($html('p[class]') as $element) {
  echo $element->class, "<br>\n"; 
}

// Find the first div with ID "gc-header" and print the plain text of
// the parent element (plain text means no HTML tags, just the text)
echo $html('div#gc-header', 0)->parent->getPlainText();

// Find out how many tags there are which are "ns:tag" or "div", but not
// "a" and do not have a class attribute
echo count($html('(ns|tag, div + !a)[!class]');

Modification

Elements can be easily modified after you've found them., (*6)

// Find all paragraph tags which are nested inside a div tag, change
// their ID attribute and print the new HTML code
foreach($html('div p') as $index => $element) {
  $element->id = "id$index";
}
echo $html;

// Center all the links inside a document which start with "http://"
// and print out the new HTML
foreach($html('a[href ^= "http://"]') as $element) {
  $element->wrap('center');
}
echo $html;

// Find all odd indexed "td" elements and change the HTML to make them links
foreach($html('table td:odd') as $element) {
  $element->setInnerText('<a href="#">'.$element->getPlainText().'</a>');
}
echo $html;

Beautify

Pharse can also help you beautify your code and format it properly., (*7)

// Beautify the old HTML code and print out the new, formatted code
Pharse::dom_format($html, array('attributes_case' => CASE_LOWER));
echo $html;

License

Pharse is licensed under the Artistic License/GPL, (*8)

The Versions

11/02 2018

dev-master

9999999-dev https://github.com/ressio/pharse

Fastest PHP HTML Parser

  Sources   Download

GPL Artistic GPL-2.0-or-later Artistic-1.0-Perl

The Requires

  • php >=5.0.0

 

parser php html dom html5

24/03 2015

0.2.0

0.2.0.0 https://github.com/ressio/pharse

Fastest PHP HTML Parser

  Sources   Download

GPL Artistic

The Requires

  • php >=5.0.0

 

parser php html dom html5

24/03 2015

0.1

0.1.0.0 https://github.com/ressio/pharse

Fastest PHP HTML Parser

  Sources   Download

GPL Artistic

The Requires

  • php >=5.0.0

 

parser php html dom html5