diqa-import
Imports Office documents, makes full-text and metadata available for faceted search, (*1)
DIQAimport, (*2)
#
Installation
#
Run once:
extensions/DIQAimport/maintenance/Setup.php, (*3)
Configure cron-jobs:, (*4)
crontab -l | { cat; echo "* * * * * php /var/www/html/mediawiki/extensions/Import/maintenance/CrawlDirectory.php"; } | crontab -
crontab -l | { cat; echo "* * * * * php /var/www/html/mediawiki/maintenance/runJobs.php"; } | crontab -
Create directory which contains the documents (a mount point):, (*5)
sudo mkdir -p /opt/freigabe
#
Settings
#
-
$wgDIQAImportUseAllMetadata, (*6)
Stores all extracted metadata in SOLR (NOT in the wiki!) to allow
exploring the data via Faceted Search., (*7)
Default value: false, (*8)
#
Usage
#
1. Go to Special:DIQAimport (as WikiSysop)
2. Mount a Windows folder with Office documents into the linux file system
Usage: bin/mountWinShare.sh \\UNC\Path\to\folder User
The folder is mounted to: /opt/freigabe
For example: ./mountWinShare.sh //192.168.1.7/testfreigabe Kai
3. Create at least one crawler config.
Import-Path: /opt/freigabe
UNC-Path: \\KAIS-PC\testfreigabe
Interval: any
4. Optional: Creating tagging rules on Special:DIQAtagging
Note:
If you change the tagging rules later, you have to refresh your semantic data.
The crawler will do this only for modified documents., (*9)