2017 © Pedro Peláez
 

library phpngrams

Get N-Grams from strings and/or arrays.

image

drupol/phpngrams

Get N-Grams from strings and/or arrays.

  • Tuesday, February 6, 2018
  • by drupol
  • Repository
  • 2 Watchers
  • 7 Stars
  • 3 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 1 Forks
  • 0 Open issues
  • 3 Versions
  • 50 % Grown

The README.md

Latest Stable Version Total Downloads Build Status Scrutinizer Code Quality Code Coverage Mutation testing badge License, (*1)

PHPNgrams

PHP N-Grams library, (*2)

Introduction

In the fields of computational linguistics, machine-learning and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles., (*3)

An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram". Larger sizes are sometimes referred to by the value of n in modern language, e.g., "four-gram", "five-gram", and so on. (More on Wikipedia), (*4)

Requirements

  • PHP >= 7.0

Installation

Include this library in your project by doing:, (*5)

composer require drupol/phpngrams, (*6)

The library provides two classes:, (*7)

  • NGrams
  • NGramsCyclic

and one trait:, (*8)

  • NGramsTrait

Usage

<?php

declare(strict_types = 1);

namespace drupol\phpngrams\tests;

use drupol\phpngrams\NGrams;
use drupol\phpngrams\NGramsCyclic;

include 'vendor/autoload.php';

$string = 'hello world';

// Better use preg_split() than str_split() in case of UTF8 strings.
$chars = preg_split('/(?!^)(?=.)/u', $string);

$ngrams = (new NGrams())->ngrams($chars, 3);

print_r(iterator_to_array($ngrams));
/*
[
    0 =>
        [
            0 => 'h',
            1 => 'e',
            2 => 'l',
        ],
    1 =>
        [
            0 => 'e',
            1 => 'l',
            2 => 'l',
        ],
    2 =>
        [
            0 => 'l',
            1 => 'l',
            2 => 'o',
        ],
    3 =>
        [
            0 => 'l',
            1 => 'o',
            2 => ' ',
        ],
    4 =>
        [
            0 => 'o',
            1 => ' ',
            2 => 'w',
        ],
    5 =>
        [
            0 => ' ',
            1 => 'w',
            2 => 'o',
        ],
    6 =>
        [
            0 => 'w',
            1 => 'o',
            2 => 'r',
        ],
    7 =>
        [
            0 => 'o',
            1 => 'r',
            2 => 'l',
        ],
    8 =>
        [
            0 => 'r',
            1 => 'l',
            2 => 'd',
        ],
];
*/

$string = 'hello world';

// Better use preg_split() than str_split() in case of UTF8 strings.
$chars = preg_split('/(?!^)(?=.)/u', $string);

$ngrams = (new NGramsCyclic())->ngrams($chars, 3);

print_r(iterator_to_array($ngrams));
/*
[
    0 => [
            0 => 'h',
            1 => 'e',
            2 => 'l',
        ],
    1 => [
            0 => 'e',
            1 => 'l',
            2 => 'l',
        ],
    2 => [
            0 => 'l',
            1 => 'l',
            2 => 'o',
        ],
    3 => [
            0 => 'l',
            1 => 'o',
            2 => ' ',
        ],
    4 => [
            0 => 'o',
            1 => ' ',
            2 => 'w',
        ],
    5 => [
            0 => ' ',
            1 => 'w',
            2 => 'o',
        ],
    6 => [
            0 => 'w',
            1 => 'o',
            2 => 'r',
        ],
    7 => [
            0 => 'o',
            1 => 'r',
            2 => 'l',
        ],
    8 => [
            0 => 'r',
            1 => 'l',
            2 => 'd',
        ],
    9 => [
            0 => 'l',
            1 => 'd',
            2 => 'h',
        ],
    10 => [
            0 => 'd',
            1 => 'h',
            2 => 'e',
        ],
];
*/

To reduce to the maximum the memory footprint, the library returns Generators, if you want to get the complete resulting array, use iterator_to_array()., (*9)

API

Find the complete API documentation at https://not-a-number.io/phpngrams., (*10)

Code quality and tests

Every time changes are introduced into the library, Travis CI run the tests., (*11)

The library has tests written with PHPSpec., (*12)

Feel free to check them out in the spec directory. Run composer phpspec to trigger the tests., (*13)

PHPInfection is used to ensure that your code is properly tested, run composer infection to test your code., (*14)

Contributing

Feel free to contribute to this library by sending Github pull requests. I'm quite reactive :-), (*15)

The Versions