2017 © Pedro Peláez
 

library matcher

Powerful XML and HTML matching and data extraction library

image

atrox/matcher

Powerful XML and HTML matching and data extraction library

  • Sunday, February 11, 2018
  • by kaja47
  • Repository
  • 11 Watchers
  • 87 Stars
  • 60,616 Installations
  • PHP
  • 1 Dependents
  • 0 Suggesters
  • 6 Forks
  • 3 Open issues
  • 4 Versions
  • 2 % Grown

The README.md

Atrox\Matcher

Downloads this Month Build Status License, (*1)

Matcher - powerful tool for extracting data from XML and HTML using XPath and pure magic., (*2)

Why was Matcher made (czech), XPath intro (czech), (*3)

Installation:

Install Matcher using Composer:, (*4)

composer require atrox/matcher

Examples:

use Atrox\Matcher;

$m = Matcher::multi('//div[@id="siteTable"]/div[contains(@class, "thing")]', [
  'id'    => '@data-fullname',
  'title' => './/p[@class="title"]/a',
  'url'   => './/p[@class="title"]/a/@href',
  'date'  => './/time/@datetime',
  'img'   => 'a[contains(@class, "thumbnail")]/img/@src',
  'votes' => (object) [
    'ups'   => '@data-ups',
    'downs' => '@data-downs',
    'rank'  => 'span[@class="rank"]',
    'score' => './/div[contains(@class, "score")]',
  ],
])->fromHtml();

$f = file_get_contents('http://www.reddit.com/');

$extractedData = $m($f);

result:, (*5)

[
  [
    "id"    => "t3_1ep0c5",
    "title" => "Obligatory funny cat pictures.",
    "url"   => "http://imgur.com/sGu0pEk",
    "date"  => "2013-05-20T14:16:24+00:00",
    "img"   => "http://e.thumbs.redditmedia.com/MZjtg3UnZ8MOVjcd.jpg",
    "votes" => (object) [
      "ups"   => "115036",
      "downs" => "10266",
      "rank"  => "1",
      "score" => "105650"
    ]
  ],
  [
    ...
  ]
]

Matchers can be arbitrarily chained and nested., (*6)

$postMatcher = Matcher::single('.//div[@class="postInfo desktop"]', [
  'id'   => './input/@name',
  'name' => './span[@class="nameBlock"]/span[@class="name"]',
  'date' => './span/@data-utc',
]);

$m = Matcher::multi('//div[@class="thread"]', [
  'op'      => Matcher::single('./div[@class="postContainer opContainer"]', $postMatcher),
  'replies' => Matcher::multi('./div[@class="postContainer replyContainer"]', $postMatcher)
])->fromHtml();

$f = file_get_contents('http://boards.4chan.org/po/');

$extractedData = $m($f);

result:, (*7)

[
  [
    "op" => [
      "id"   => "481874858",
      "name" => "Anonymous",
      "date" => "1369242761"
    ],
    "replies" => [
      [
        "id"   => "481879347",
        "name" => "moot",
        "date" => "1369244554"
      ],
      ...
    ]
  ],
  [
    ...
  ],
  ...
]

Use with external parsers:

Because Matcher is internally working with DOMDocument or SimpleXML objects it's possible to use it with external HTML/XML parsers such as html5-php., (*8)

$html5 = new Masterminds\HTML5(['disable_html_ns' => true]);
$dom = $html5->loadHTML($html);

$m = Matcher::single('//h1');
$title = $m($dom);

The Versions

11/02 2018

dev-master

9999999-dev

Powerful XML and HTML matching and data extraction library

  Sources   Download

BSD-3-Clause New BSD

The Requires

  • php >=5.3.0

 

The Development Requires

xml html xpath

10/02 2018

v1.1.1

1.1.1.0

Powerful XML and HTML matching and data extraction library

  Sources   Download

BSD-3-Clause

The Requires

  • php >=5.3.0

 

The Development Requires

xml html xpath

07/03 2016

v1.1.0

1.1.0.0

Powerful XML and HTML matching and data extraction library

  Sources   Download

New BSD

The Requires

  • php >=5.3.0

 

The Development Requires

xml html xpath

19/06 2014

v1.0.0

1.0.0.0

Powerful XML and HTML matching library

  Sources   Download

New BSD

xml html xpath