2017 © Pedro Peláez
 

library spider4schema

A Web Bot that crawls the http://Schema.org web site to retrieve all available Types and Properties in order to create a JSON file and also some PHP libraries.

image

palex/spider4schema

A Web Bot that crawls the http://Schema.org web site to retrieve all available Types and Properties in order to create a JSON file and also some PHP libraries.

  • Sunday, March 12, 2017
  • by P.Alex
  • Repository
  • 3 Watchers
  • 9 Stars
  • 20 Installations
  • PHP
  • 1 Dependents
  • 0 Suggesters
  • 2 Forks
  • 1 Open issues
  • 3 Versions
  • 0 % Grown

The README.md

Spider4Schema Build Status

A Web Bot that crawls the http://Schema.org web site to retrieve all available Types and Properties in order to create a JSON file and also some PHP libraries.
For generating Microdata or RDFa Lite 1.1 semantics you can use the PHP library https://github.com/alexprut/PHPStructuredData. Created during the Google Summer of Code 2013 and 2014., (*1)

(Deprecated), (*2)

Documentation

Files structure:

  • configuration.php → the configuration file, setup the type of library to be created.
  • http.php → a class that handles all HTTP requests.
  • parser.php → methods to parse the HTML and retrieve all needed information.
  • fileCreator.php → methods to create the library files.

Usage

  • Make sure you have the cURL library installed, and the PHP CLI shell script package
  • Clone the repo: git clone https://github.com/alexprut/Spider4Schema.git
  • Enter Spider4Schema/ directory
  • Open your terminal/shell and call php bin/spider.php [minified|json|normal] [true|false|verbose]

The libraries will be created in the dist/ folder., (*3)

Library types

There are 3 types of libraries you can create:, (*4)

  • JSON → a .json file containing all available Types and Properties, used in library https://github.com/alexprut/PHPStructuredData for generating valid Microdata and RDFa Lite 1.1 semantics
  • Minified → a .php file with an array containing all available Types and Properties
  • Normal → each Type is a PHP class file (an abstract class with static Properties)

Performance

The json library:
1 .json file, 91 KB, contains all available Types (620+) and its Properties, (*5)

The minified library:
1 php file, 107 KB, contains all available Types (620+) and its Properties, stored in a hash table (array), (*6)

The normal abstract static library:
622 php files, 710 KB, 1 file for each available Type, (*7)

Todos

  • Add to the all the required properties specified by Google, Yandex, Baidu.
  • Instead of making 620+ HTTP requests, parse one file: https://schema.org/docs/schema_org_rdfa.html
  • Write tests.

License

Spider4Schema is licensed under the MIT License – see the LICENSE file for details., (*8)

The Versions

12/03 2017

dev-master

9999999-dev https://github.com/PAlexcom/Spider4Schema

A Web Bot that crawls the http://Schema.org web site to retrieve all available Types and Properties in order to create a JSON file and also some PHP libraries.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

schema semantic schema.org

21/09 2014

dev-dev

dev-dev https://github.com/PAlexcom/Spider4Schema

A Web Bot that crawls the http://Schema.org web site to retrieve all available Types and Properties in order to create a JSON file and also some PHP libraries.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

schema semantic schema.org

21/09 2014

v1.2.0

1.2.0.0 https://github.com/PAlexcom/Spider4Schema

A Web Bot that crawls the http://Schema.org web site to retrieve all available Types and Properties in order to create a JSON file and also some PHP libraries.

  Sources   Download

MIT

The Requires

  • php >=5.3.0

 

The Development Requires

schema semantic schema.org