2017 © Pedro Peláez
 

library monachus

Library to handle texts, includes: Spell checker, Stemer, Language detection

image

ssola/monachus

Library to handle texts, includes: Spell checker, Stemer, Language detection

  • Thursday, January 16, 2014
  • by ssola
  • Repository
  • 1 Watchers
  • 4 Stars
  • 5 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 0 Forks
  • 0 Open issues
  • 2 Versions
  • 0 % Grown

The README.md

Monachus Build Status

Monachus is a library that helps you working with text, in any language. Monachus means Monk in Latin language, I think it's a good name to define this library. Monks were used to work a lot with books (strings) in a wide range of languages., (*1)

This library has been created keeping in mind these PHP versions: 5.5, 5.4, 5.3, (*2)

Install

The simplest way is with Composer, just add these lines to your composer.json:, (*3)

"repositories": [
    {
    "type": "git",
    "url": "https://github.com/ssola/monachus.git"
    }
]

How it works

String, (*4)


The first thing we need to know is how to use the String class. This class generates an object with a specific text. It will preserve that text in UTF-8 charset along the way., (*5)

include_once("./vendor/autoload.php");

use Monachus\String as String;

$text = new String("Hello World!");
echo $text;

Obviously this code is generating a new String object with a value and then it's printed., (*6)

Then you can do things like:, (*7)

include_once("./vendor/autoload.php");

use Monachus\String as String;

$text = new String("Hello World!");
echo $text->length();
echo $text->find("World");
echo $text->toUppercase();

if($text->equals("Hello World!"))
  echo $text->toLowercase();

This kind of objects is used extensively in this library in order to perform all the actions with the proper charset., (*8)

Tokenizer, (*9)


Do you need to tokenize a string? Monachus can do it for you! We support a lot of languages, Japanese included! But if your language is not supported... relax! You can create your own adapters in order to tokenize different languages., (*10)

Let's do a simple example:, (*11)

include_once("./vendor/autoload.php");

use Monachus\String as String;
use Monachus\Tokenizer as Tokenizer;

$text = new String("This is a text");
$tokenizer = new Tokenizer();

var_dump($tokenizer->tokenize($text));

// Now imagine you need to tokenize a Japanase text
$textJp = new String("は太平洋側を中心に晴れた所が多いが");
$tokenizerJp = new Tokenizer(new Monachus\Tokenizers\Japanase());

var_dump($tokenizerJp);

As you have seen, we can use our own adapters to tokenize complex languages like Japanase or Chinese. Now it's time to explain you how to create these adapters., (*12)

class MyTokenizer implements Monachus\Interfaces\TokenizerInterface
{
  public function tokenize(Monachus\String $string)
  {
    // your awesome code!
  }
}

$tokenizer = new Monachus\Tokenizer(new MyTokenizer());
var_dump($tokenizer->tokenize(new Monachus\String("Поиск информации в интернете"));

N-Gram, (*13)


Yeah! Monachus is able to generate different levels of N-gram sequences, for example a bigram or trigram. But let's see how it works., (*14)

include_once("./vendor/autoload.php");

use Monachus\String as String;
use Monachus\Ngram as Ngram;
use Monachus\Config as Config;

$text = new String("This is an awesome text");

$config = new Config();
$config->max = 3; // we're creating trigrams.

$ngram = new Ngram($config);
var_dump($ngram->parse($text));

Do you need your own N-gram parser? No problem! You can create your own parsers as well., (*15)

class MyParser implements Monachus\Interfaces\NgramParserInterface
{
  public function parse(String $string, $level)
  {
    // your awesome code!
  }
}

And then..., (*16)

include_once("./vendor/autoload.php");

use Monachus\String as String;
use Monachus\Ngram as Ngram;
use Monachus\Config as Config;

$text = new String("This is an awesome text");

$config = new Config();
$config->max = 3; // we're creating trigrams.

$ngram = new Ngram($config);
$ngram->setParser(new MyParser());
var_dump($ngram->parse($text));

The Versions

16/01 2014

dev-master

9999999-dev

Library to handle texts, includes: Spell checker, Stemer, Language detection

  Sources   Download

The Requires

  • php >=5.3.0

 

The Development Requires

by Sergio Sola

16/01 2014

1.0.0

1.0.0.0

Library to handle texts, includes: Spell checker, Stemer, Language detection

  Sources   Download

The Requires

  • php >=5.3.0

 

The Development Requires

by Sergio Sola