AiCrawler
Leverage Ai design patterns by using heuristics with the Symfony DOMCrawler., (*1)
Please crawl on over to the docs which are also available as a gitbook., (*2)
, (*3)
, (*4)
Quickstart
The AiCrawler package has the responsibility of making boolean assertions on a node in the HTML DOM. It comes with a straight-forward data point trait which will record the results of your heuristics (rules) for a given "item" or context., (*5)
Install with Composer
$ composer require dan/aicrawler dev-master
Trivial example
$crawler = new AiCrawler('<html>...</html>');
$node = $crawler->filter('div[id="content-start"]');
$args = ['words' => 15];
// Does the content have at least 15 words?
$assertion = Heuristics::words($node, $args); // true / false
A more expressive example
$crawler = new AiCrawler("<html>...</html>");
$args = [
'elements' => [
"elements" => "/p/ /blockquote/ /(u|o)l/ /h[1-6]/",
"regex" => true,
'words' => [
'words' => 15,
'descendants' => true,
'words2' => [
'words' => "/(cod(ing|ed|e)|program|language|php)/",
'regex' => true,
'descendants' => true
]
]
],
'matches' => 3
]
/**
* Do at least 3 of this div's children which are p, blockquote, ul, ol or any
* h element AND contain at least 15 words (including text from the child's
* descendants) AND words such as coding, coded, code, program, language, php
* (including text from the child's descendants).
*/
$crawler->filter("div")->each(function(&$node) use ($args) {
if (Heuristics::children($node, $args) {
$node->setDataPoint("example", "words", 1);
}
});
Sound interested? Read on about the Heuristics class or go right to a similar example with complete notes., (*6)
, (*7)
Version 0.0.1
- A
Heuristics class with some cool rules to get you started.
- A
Scorable trait is on our AiCrawlerclass so there is a pattern for data points.
- A
Extra trait is on our AiCrawler class so there is a pattern for storing extra data.
, (*8)
Todo
, (*9)
Contributing
-
Fork this project on GitHub.
- Existing unit tests must pass.
- Contributions must be unit tested.
- New heuristics should be portable (have few or no dependencies).
- New heuristics should have helpful doc blocks.
- Submit a pull request.
- See guide on extending
Heuristics for special heuristics.
, (*10)
Documentation
- Follow PSR-2.
- Add PHPDoc blocks for all classes, methods, and functions
- Omit the
@return tag if the method does not return anything
- Add a blank line before
@param, @return or @throws
Any issues, please report here, (*11)
, (*12)
License
AiCrawler is free software distributed under the terms of the MIT license., (*13)