dev-master
9999999-devparse web pages and generate card object from microdata
MIT
The Requires
- php >=5.4.0
- fabpot/goutte 2.0.*@dev
by Tanguy Godin
Wallogit.com
2017 © Pedro Peláez
parse web pages and generate card object from microdata
v 1.3.3, (*2)
Allows web page parsing and gather microdata., (*3)
Filtering\hook possibilities at card instanciation or in PostProcessing., (*4)
Output: card collection as hydratated object or json encoded string., (*5)
require a composer dump-autoload --optimize, (*6)
require_once "vendor/autoload.php";
use Uthmordar\Cardator\Card\CardGenerator;
use Uthmordar\Cardator\Card\CardProcessor;
use Uthmordar\Cardator\Cardator;
use Uthmordar\Cardator\Parser\Parser;
try{
$cardator=new Cardator(new CardGenerator, new CardProcessor, new Parser);
/* give only Article type card in output (only has priority over except) */
$cardator->addOnly('Article');
/* Thing type card will not be given in output */
$cardator->addExcept('Thing');
/* choose url to crawl and extract data, throw RuntimeException if header status 400+ */
$crawl=$cardator->crawl('http://google.fr');
/* given closure will be use on given property for all card during the postprocess */
$cardator->addPostProcessTreatment('my_property_to_filter', function($name, $value){
// what I want to do
});
$cardator->doPostProcess();
/* get cards as json */
$cards=$cardator->getCards(true);
/* get cards as SplObjectStorage collection */
$cards=$cardator->getCards();
foreach($cards as $c){
// do something with cards
}
}catch(\RuntimeException $e){
// do something with error
}
This tool crawl webpages searching for microdata specifications., (*7)
It will also tracks some special attributes and link them to given itemprop: * dk-raw is an attribute you should use to give informations usable only by developers or robots such as datetime instead of human readable date. * content attribute could be use on meta tag to mark content hide to user * value attr could be use to pass numeric value related to tag, (*8)
You could access some processing informations as follow:, (*9)
$cardator->getTotalCard(); // Give number of card found
$cardator->getExecutionTime(); // return crawl duration in s
$cardator->getStatus(); // return crawler http status
$cardator->getExecutionData(); // return previous informations as array
You could easily create Card object with:, (*10)
$cardator->createCard('Article');
You could change card library by extending CardGenerator and giving a new library namespacing path as long as you respect interface implementation for cards, (*11)
$article=$cardator->createCard('Article');
// GET
$name=$article->name;
$name=$article->name();
// SET
$article->name='My Article';
$article->name('My Article');
// Existant properties will be hydrated, non-existant property will create an entry in $params array
$article->params['non-existant'];
// You could access to all hydrated properties name in an array
$properties=$article->properties;
// Card type and card hierarchy
$cardName=$article->getQualifiedName();
$cardType=$article->type;
// Parents : will return an array ['Thing', 'CreativeWork']
// if more than one parent exist for an item : [['Thing', 'CreativeWork', 'SoftwareApplication'], ['Thing', 'CreativeWork', 'Game']]
$cardParents=$article->getParents();
OR
$cardParents=$article->parents;
// will return 'Thing\CreativeWork\SoftawareApplication::Thing\CreativeWork\Game'
// Return the direct parent Name
$cardDirectParent=$article->getDirectparent();
As seen before you could add PostProcessing globally on cardator:, (*12)
$cardator->addPostProcessTreatment('my_property_to_filter', function($name, $value){
// what I want to do
});
If you want to create more specific treatment you could also edit the Card in card library as follow:, (*13)
public function __construct(){
$this->addFilter('my_property_to_filter', function($name, $value){
// what I want to do
});
}
It is also possible to edit your own processing action in Card\lib\FilterCard:, (*14)
$filter=[
'my_property_to_filter'=>'function to call'
];
parse web pages and generate card object from microdata
MIT