2017 © Pedro Peláez
 

mediawiki-extension import

Imports Word, PDF, Excel, PowerPoint documents

image

diqa/import

Imports Word, PDF, Excel, PowerPoint documents

  • Friday, February 2, 2018
  • by kkthek
  • Repository
  • 1 Watchers
  • 0 Stars
  • 15 Installations
  • JavaScript
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 1 Open issues
  • 1 Versions
  • 0 % Grown

The README.md

diqa-import

Imports Office documents, makes full-text and metadata available for faceted search, (*1)

DIQAimport, (*2)

#
Installation    
#

Run once: extensions/DIQAimport/maintenance/Setup.php, (*3)

Configure cron-jobs:, (*4)

crontab -l | { cat; echo "* * * * *  php /var/www/html/mediawiki/extensions/Import/maintenance/CrawlDirectory.php"; } | crontab -
crontab -l | { cat; echo "* * * * *  php /var/www/html/mediawiki/maintenance/runJobs.php"; } | crontab -

Create directory which contains the documents (a mount point):, (*5)

sudo mkdir -p /opt/freigabe
#
Settings
#
  1. $wgDIQAImportUseAllMetadata, (*6)

    Stores all extracted metadata in SOLR (NOT in the wiki!) to allow exploring the data via Faceted Search., (*7)

    Default value: false, (*8)

#
Usage   
#
1. Go to Special:DIQAimport (as WikiSysop)

2. Mount a Windows folder with Office documents into the linux file system

        Usage: bin/mountWinShare.sh \\UNC\Path\to\folder User
        The folder is mounted to: /opt/freigabe

        For example: ./mountWinShare.sh //192.168.1.7/testfreigabe Kai

3. Create at least one crawler config. 

        Import-Path: /opt/freigabe
        UNC-Path:    \\KAIS-PC\testfreigabe
        Interval: any

4. Optional: Creating tagging rules on Special:DIQAtagging

Note: If you change the tagging rules later, you have to refresh your semantic data. The crawler will do this only for modified documents., (*9)

The Versions

02/02 2018

dev-master

9999999-dev

Imports Word, PDF, Excel, PowerPoint documents

  Sources   Download

GPL v2.0 GPL-2.0-or-later

The Requires

 

excel pdf word smw powerpoint diqa document import