linguistic/ngramextractor

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

Wednesday, December 6, 2017
by linguistic
Repository
0 Watchers
0 Stars
7 Installations

PHP
0 Dependents
0 Suggesters
0 Forks
0 Open issues
2 Versions
17 % Grown

NGramExtractor for PHP

Installation

Simple install via Composer:, _(*1)

composer require linguistic/ngramextractor

Usage

Coming soon., _(*2)

Example

$tokenizer = new Tokenizer();
$tokenizer->addRemovalRule('/<\/?\w+[\s\w\=\"\/\#\-\:\.\_]*>/') # Removes HTML Tags
->addRemovalRule('/[^a-z0-9]+/', ' ') # Replaces everything which is not text with a space
->setSeperator('/\s+/'); # Tokenizes text with whitespace as delimiter

$content = ""; # The text that should get tokenized
$stopwords = array(); # (optional) array of stopwords

$extractor = new NGramExtractor($content, $tokenizer, $stopwords);
$unigrams    = $extractor->getNGrams(1); # gets all n-grams in the text, n = 1

$unigramsFiltered    = NGramExtractor::limitByOccurance($extractor->getNGramCount(1, true), 3); # get unigrams and their occurance if the occurance is greater or equal 3

Ressources

Download of stopword lists for different languages

06/12 2017

dev-master

9999999-dev

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

Sources Download

GPL3

The Development Requires

phpunit/phpunit ^6.4

by Mark Schatz

06/12 2017

dev-add-license-1

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

Sources Download

GPL3

The Development Requires

phpunit/phpunit ^6.4

library ngramextractor

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

linguistic/ngramextractor

The README.md

NGramExtractor for PHP

Installation

Usage

Example

Ressources

The Versions

dev-master

The Development Requires

by Mark Schatz

dev-add-license-1

The Development Requires

by Mark Schatz