Wallogit.com
2017 © Pedro Peláez
Write Less, Do More., (*1)
composer require w3zone/Crawler
npm install request
require_once 'vendor/autoload.php';
use w3zone\Crawler\{Crawler, Services\phpCurl};
$crawler = new Crawler(new phpCurl);
$link = 'http://www.example.com';
// return an array [statusCode, body, headers, cookies]
// get method may contain link string or an array [url, query string]
$homePage = $crawler->get($link)->dumpHeaders()->run();
$response = $crawler->get($link)->dumpHeaders()->cookies($homePage['cookies'], 'r+w')->run();
use w3zone\Crawler\Services\phpCurl;
use w3zone\Crawler\Services\nodejsRequest;
use w3zone\Crawler\Services\cliCurl;
Get
Crawler::get(mixed $arguments);
set the request to GET method,
accepts parameter holding the requested URL., (*2)
Post
Crawler::post(mixed $arguments);
set the request to POST method,
accepts an array of options, (*3)
$arguments = [
'url' => 'www.example.com/login',
'data' => [
'username' => '',
'password' => ''
]
];
Json
Crawler::json(void)
an easy way to create a json request., (*4)
XML
Crawler::xml(void)
an easy way to create a xml request., (*5)
Referer
Crawler::referer(string $referer)
set the current request referer., (*6)
Headers
Crawler::headers(array $headers)
set the request additional headers,
note that this function will overwrite json && xml functions., (*7)
DumpHeaders
Crawler::dumpHeaders(void)
include the response headers in the object response., (*8)
Proxy
Crawler::proxy(mixed $proxy)
set the request proxy IP and proxy type,
note proxy method accepts an array of proxy IP and proxy Type or an IP string, (*9)
$proxy = [
'ip' => 'xx.xx.xx.xx:xx',
'type' => 'socks5'
];
if you've passed an IP as a string the default type will be HTTP., (*10)
Cookies
Crawler::cookies(string $file, string $mode)
set your proxy type, the first argument is a cookie string,
the seccond argument is the cookie mode ,
available modes :
-- w : write only mode
-- r : read only mode
-- w+r : read and write, (*11)
Initialize
Crawler::initialize(array $arguments)
initialize or re-initialize your request
note that , this method will overwrite the other options, (*12)
Run
Crawler::run(void)
fire the request., (*13)
Quick example to login into Github :-, (*14)
require_once 'vendor/autoload.php';
use w3zone\Crawler\{Crawler, Services\phpCurl};
$crawler = new Crawler(new phpCurl);
$url = 'https://github.com/login';
$response = $crawler->get($url)->dumpHeaders()->run();
preg_match('#<input name="authenticity_token".*?value="(.*?)"#', $response['body'], $authenticity_token);
$url = 'https://github.com/session';
$post['commit'] = 'Sign in';
$post['utf8'] = '✓';
$post['authenticity_token'] = $authenticity_token[1];
$post['login'] = 'valid email';
$post['password'] = '';
$response = $crawler
->post(['url' => $url, 'data' => $post])
->cookies($response['cookies'], 'w+r')
->initialize([
CURLOPT_FOLLOWLOCATION => true
])
->dumpHeaders()
->run();