2017 © Pedro Peláez
 

luya-module luya-module-crawler

An full search page crawler to enable complex and customized searching abilities.

image

zephir/luya-module-crawler

An full search page crawler to enable complex and customized searching abilities.

  • Thursday, June 14, 2018
  • by nadar
  • Repository
  • 2 Watchers
  • 4 Stars
  • 175 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 2 Forks
  • 2 Open issues
  • 12 Versions
  • 0 % Grown

The README.md

LUYA Logo , (*1)

Crawler

LUYA Latest Stable Version Test Coverage Total Downloads Tests, (*2)

An easy to use full-website page crawler to make provide search results on your page. The crawler module gather all information about the sites on the configured domain and stores the index in the database. From there you can now create search queries to provide search results. There are also helper methods which provide intelligent search results by splitting the input into multiple search queries (used by default)., (*3)

LUYA Crawler Search Stats, (*4)

Installation

Install the module via composer:, (*5)

composer require luyadev/luya-module-crawler:^3.0

After installation via Composer include the module to your configuration file within the modules section., (*6)

'modules' => [
    //...
    'crawler' => [
        'class' => 'luya\crawler\frontend\Module',
        'baseUrl' => 'https://luya.io',
        /*
        'filterRegex' => [
            '#.html#i', // filter all links with `.html`
            '#/agenda#i', // filter all links which contain the word with leading slash agenda,
            '#date\=#i, // filter all links with the word date inside. for example when using an agenda which will generate infinite links
        ],
        'on beforeProcess' => function() {
            // optional add or filter data from the BuilderIndex, which will be processed to the Index afterwards
        },
        'on afterIndex' => function() {
            // optional add or filter data from the freshly built Index
        }
        */
    ],
    'crawleradmin' => 'luya\crawler\admin\Module',
]

Where baseUrl is the domain you want to crawler all information., (*7)

After setup the module in your config you have to run the migrations and import command (to setup permissions):, (*8)

./vendor/bin/luya migrate
./vendor/bin/luya import

Running the Crawler

To execute the command (and run the crawler proccess) use the crawler command crawl, you should put this command in cronjob to make sure your index is up-to-date:, (*9)

Make sure your page is in utf8 mode (<meta charset="utf-8"/>) and make sure to set the language <html lang="<?= Yii::$app->composition->langShortCode; ?>">., (*10)

./vendor/bin/luya crawler/crawl

In order to provide current crawl results you should create a cronjob which crawls the page each night: cd httpdocs/current && ./vendor/bin/luya crawler/crawl, (*11)

Crawler Arguments

All crawler arguments for crawler/crawl, an example would be crawler/crawl --pdfs=0 --concurrent=5 --linkcheck=0:, (*12)

name description default
linkcheck Whether all links should be checked after the crawler has indexed your site true
pdfs Whether PDFs should be indexed by the crawler or not true
concurrent The amount of conccurent page crawles 15

Stats

You can also get statistic results enabling a cronjob executing each week:, (*13)

./vendor/bin/luya crawler/statistic

Create search form

Make a post request with query to the crawler/default/index route and render the view as follows:, (*14)





= $provider->totalCount; ?> Results

totalCount == 0): ?> <div>No results found for &laquo;<?= $query; ?>&raquo;.</div> = DidYouMeanWidget::widget(['searchModel' => $searchModel]); ?> models as $item): /* @var $item \luya\crawler\models\Index */ ?> <h3><?= $item->title; ?></h3> <p><?= $item->preview($query); ?></p> <a href="<?= $item->url; ?>"><?= $item->url; ?></a> = LinkPager::widget(['pagination' => $provider->pagination]); ?>

Crawler Settings

You can use crawler tags to trigger certains events or store informations:, (*15)

tag example description
CRAWL_IGNORE <!-- [CRAWL_IGNORE] -->Ignore this<!-- [/CRAWL_IGNORE] --> Ignores a certain content from indexing.
CRAWL_FULL_IGNORE <!-- [CRAWL_FULL_IGNORE] --> Ignore a full page for the crawler, keep in mind that links will be added to index inside the ignore page.
CRAWL_GROUP <!-- [CRAWL_GROUP]api[/CRAWL_GROUP] --> Sometimes you want to group your results by a section of a page, in order to let crawler know about the group/section of your current page. Now you can group your results by the group field.
CRAWL_TITLE <!-- [CRAWL_TITLE]My Title[/CRAWL_TITLE] --> If you want to make sure to always use your customized title you can use the CRAWL_TITLE tag to ensure your title for the page:

The Versions

14/06 2018

dev-master

9999999-dev http://luya.io

An full search page crawler to enable complex and customized searching abilities.

  Sources   Download

MIT

The Requires

 

The Development Requires

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

27/04 2018

1.0.2

1.0.2.0 http://luya.io

An full search page crawler to enable complex and customized searching abilities.

  Sources   Download

MIT

The Requires

 

The Development Requires

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

28/03 2018

1.0.1

1.0.1.0 http://luya.io

An full search page crawler to enable complex and customized searching abilities.

  Sources   Download

MIT

The Requires

 

The Development Requires

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

12/12 2017

1.0.0

1.0.0.0 http://luya.io

An full search page crawler to enable complex and customized searching abilities.

  Sources   Download

MIT

The Requires

 

The Development Requires

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

05/09 2017

1.0.0-RC4

1.0.0.0-RC4 http://luya.io

An full search page crawler to enable complex and customized searching abilities.

  Sources   Download

MIT

The Requires

 

The Development Requires

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

11/04 2017

1.0.0-RC3

1.0.0.0-RC3 http://luya.io

An full search page crawler to enable complex and customized searching abilities.

  Sources   Download

MIT

The Requires

 

The Development Requires

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

29/11 2016

1.0.0-RC2

1.0.0.0-RC2 http://luya.io

An easy to use page crawler to make an internal search field on your page. The crawlermodule gather all informations about the sides on the configured domain.

  Sources   Download

MIT

The Requires

 

The Development Requires

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

29/09 2016

1.0.0-RC1

1.0.0.0-RC1 http://luya.io

An easy to use page crawler to make an internal search field on your page. The crawlermodule gather all informations about the sides on the configured domain.

  Sources   Download

MIT

The Requires

 

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

28/07 2016

1.0.0-beta8

1.0.0.0-beta8 http://luya.io

An easy to use page crawler to make an internal search field on your page. The crawlermodule gather all informations about the sides on the configured domain.

  Sources   Download

MIT

The Requires

 

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

14/06 2016

1.0.0-beta7

1.0.0.0-beta7 http://luya.io

An easy to use page crawler to make an internal search field on your page. The crawlermodule gather all informations about the sides on the configured domain.

  Sources   Download

MIT

The Requires

 

php yii2 module yii crawler luya luya-module yii2-pagecrawler pagecrawler

21/04 2016

1.0.0-beta6

1.0.0.0-beta6 http://luya.io

Yii2 LUYA Crawler Module

  Sources   Download

MIT

The Requires

 

php yii2 module yii crawler luya yii2-pagecrawler pagecrawler

09/02 2016

1.0.0-beta5

1.0.0.0-beta5 http://luya.io

Zephir Luya Crawler Module

  Sources   Download

MIT

The Requires

 

php yii2 module yii crawler luya yii2-pagecrawler pagecrawler