library ngramextractor
Extracts ngrams from a given text and does linguistic pre-processing like stopword removal
linguistic/ngramextractor
Extracts ngrams from a given text and does linguistic pre-processing like stopword removal
- Wednesday, December 6, 2017
- by linguistic
- Repository
- 0 Watchers
- 0 Stars
- 7 Installations
- PHP
- 0 Dependents
- 0 Suggesters
- 0 Forks
- 0 Open issues
- 2 Versions
- 17 % Grown
Installation
Simple install via Composer:, (*1)
composer require linguistic/ngramextractor
Usage
Coming soon., (*2)
Example
$tokenizer = new Tokenizer();
$tokenizer->addRemovalRule('/<\/?\w+[\s\w\=\"\/\#\-\:\.\_]*>/') # Removes HTML Tags
->addRemovalRule('/[^a-z0-9]+/', ' ') # Replaces everything which is not text with a space
->setSeperator('/\s+/'); # Tokenizes text with whitespace as delimiter
$content = ""; # The text that should get tokenized
$stopwords = array(); # (optional) array of stopwords
$extractor = new NGramExtractor($content, $tokenizer, $stopwords);
$unigrams = $extractor->getNGrams(1); # gets all n-grams in the text, n = 1
$unigramsFiltered = NGramExtractor::limitByOccurance($extractor->getNGramCount(1, true), 3); # get unigrams and their occurance if the occurance is greater or equal 3
Ressources
dev-master
9999999-dev
Extracts ngrams from a given text and does linguistic pre-processing like stopword removal
Sources
Download
GPL3
The Development Requires
by
Mark Schatz
dev-add-license-1
dev-add-license-1
Extracts ngrams from a given text and does linguistic pre-processing like stopword removal
Sources
Download
GPL3
The Development Requires
by
Mark Schatz