2017 © Pedro Peláez
 

library ngramextractor

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

image

linguistic/ngramextractor

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

  • Wednesday, December 6, 2017
  • by linguistic
  • Repository
  • 0 Watchers
  • 0 Stars
  • 7 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 2 Versions
  • 17 % Grown

The README.md

NGramExtractor for PHP

Installation

Simple install via Composer:, (*1)

composer require linguistic/ngramextractor

Usage

Coming soon., (*2)

Example

$tokenizer = new Tokenizer();
$tokenizer->addRemovalRule('/<\/?\w+[\s\w\=\"\/\#\-\:\.\_]*>/') # Removes HTML Tags
->addRemovalRule('/[^a-z0-9]+/', ' ') # Replaces everything which is not text with a space
->setSeperator('/\s+/'); # Tokenizes text with whitespace as delimiter
$content = ""; # The text that should get tokenized
$stopwords = array(); # (optional) array of stopwords

$extractor = new NGramExtractor($content, $tokenizer, $stopwords);
$unigrams    = $extractor->getNGrams(1); # gets all n-grams in the text, n = 1

$unigramsFiltered    = NGramExtractor::limitByOccurance($extractor->getNGramCount(1, true), 3); # get unigrams and their occurance if the occurance is greater or equal 3

Ressources

The Versions

06/12 2017

dev-master

9999999-dev

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

  Sources   Download

GPL3

The Development Requires

by Mark Schatz

06/12 2017

dev-add-license-1

dev-add-license-1

Extracts ngrams from a given text and does linguistic pre-processing like stopword removal

  Sources   Download

GPL3

The Development Requires

by Mark Schatz