2017 © Pedro Peláez
 

library php-webminer

A php client that uses WebDriver and Querypath

image

kjenney/php-webminer

A php client that uses WebDriver and Querypath

  • Wednesday, March 18, 2015
  • by kjenney
  • Repository
  • 3 Watchers
  • 3 Stars
  • 19 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 1 Forks
  • 1 Open issues
  • 8 Versions
  • 0 % Grown

The README.md

php-webminer -- Extract data using Selenium, QueryPath and PHP

DESCRIPTION

The goal of this project is to create an extensible system for extracting data from web pages. Currently it is using Selenium WebDriver (via php-webdriver), QueryPath, and a configuration file which specifies which components to extract and how to output the results., (*1)

Job File

The "job" configuration file defines all of the aspects of the system (database, infrastructure) and the web site and the data you wish to extract., (*2)

It is in XML and has the following options:, (*3)

  1. Child element "site" must be defined
  2. Child element "steps" are recommended as they drive actions

Database, (*4)

Currently a single MySQL database is accepted. If elements are defind the XML will be imported into the database->table per the specifications in the Configuration File, (*5)

Actions, (*6)

  1. Click
  2. Type
  3. Captcha

Elements, (*7)

  1. Input - CSS Selectors used by QueryPath to pull data from a web page
  2. Output - Element name of Output XML

Samples are included in the /examples folder., (*8)

Outputs XML

The definitions in the configuration define how the output will be formatted (element names)., (*9)

INSTALLING

GET THE CODE, (*10)

Github

git clone git@github.com:kjenney/php-webminer.git

Packagist

Add the dependency. https://packagist.org/packages/kjenney/php-webminer, (*11)

{
  "require": {
    "kjenney/php-webminer": "dev-master"
  }
}

BUILD WITH DEPENDENCIES, (*12)

Download the composer.phar, (*13)

curl -sS https://getcomposer.org/installer | php

Install the library., (*14)

php composer.phar install

Install PHP5 Extensions, (*15)

apt-get install php5-tidy
yum install php-tidy

apt-get install php5-mysqlnd

Install Tesseract (optional), (*16)

apt-get install tesseract-ocr

GETTING STARTED

  • All you need as the server for this client is the selenium-server-standalone-#.jar file provided here: http://www.seleniumhq.org/download/, (*17)

  • Download and run that file, replacing # with the current server version., (*18)

    java -jar selenium-server-standalone-#.jar

Support

  • Wiki - https://github.com/kjenney/php-webminer/wiki

Contributing

  • There's still a lot of work that needs to be done, but I welcome any help and/or suggestions., (*19)

  • Feel free to create issues and recommend features., (*20)

The Versions

18/03 2015

dev-master

9999999-dev https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

The Development Requires

selenium php webdriver kjenney

01/03 2015

0.6.x-dev

0.6.9999999.9999999-dev https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

The Development Requires

selenium php webdriver kjenney

25/02 2015

0.5.x-dev

0.5.9999999.9999999-dev https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

The Development Requires

selenium php webdriver kjenney

25/02 2015

0.5

0.5.0.0 https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

The Development Requires

selenium php webdriver kjenney

14/02 2015

0.4

0.4.0.0 https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

selenium php webdriver kjenney

14/02 2015

0.3

0.3.0.0 https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

selenium php webdriver kjenney

14/02 2015

0.2

0.2.0.0 https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

selenium php webdriver kjenney

14/02 2015

0.1

0.1.0.0 https://github.com/kjenney/php-webminer

A php client that uses WebDriver and Querypath

  Sources   Download

Apache-2.0

The Requires

 

selenium php webdriver kjenney