2017 © Pedro Peláez
 

library phpantispam

Bayesian spam recognition library

image

bgruszka/phpantispam

Bayesian spam recognition library

  • Sunday, September 4, 2016
  • by bgruszka
  • Repository
  • 2 Watchers
  • 18 Stars
  • 37 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 4 Forks
  • 7 Open issues
  • 9 Versions
  • 3 % Grown

The README.md

PHPAntiSpam

Build Status Code Climate, (*1)

PHPAntiSpam is a library that recognize if documents / messages / texts are spam or not. The library use statistical analysis.

Explanation in 4 steps:

  • Create tokenizer
  • Create corpus (with lexemes) from historical messages
  • Choose method to use in classification
  • Classify message

Implemented methods:

  • Paul Graham method
  • Brian Burton method
  • Robinson Geometric Mean Test method
  • Fisher-Robinson's Inverse Chi-Square Test method

Installation

composer require bgruszka/phpantispam "^0.2", (*2)

Examples


<?php // First add autoloader and all necessary classes require_once 'vendor/autoload.php'; use PHPAntiSpam\Corpus\ArrayCorpus; use PHPAntiSpam\Classifier; use PHPAntiSpam\Tokenizer\WhitespaceTokenizer; // Let's decleare our example training set $messages = [ ['category' => 'spam', 'content' => 'this is spam'], ['category' => 'nospam', 'content' => 'this is'], ]; // As tokenizer we can use the simplest one - WhitespaceTokenizer (but of course you can also use RegexpTokenizer // or create new one) $tokenizer = new WhitespaceTokenizer(); // Let's define our corpus - collection of text documents $corpus = new ArrayCorpus($messages, $tokenizer); // For classifying text we can use different methods // ------------------------------------------------------------------------------------ // Graham method $classifier = new Classifier($corpus); $classifier->setMethod(new \PHPAntiSpam\Method\GrahamMethod($corpus)); $spamProbability = $classifier->isSpam('This is spam'); echo 'With Graham method:' . PHP_EOL; echo sprintf('Spam probability: %s', $spamProbability) . PHP_EOL; echo sprintf('Is spam: %s', $spamProbability < 0.9 ? 'NO' : 'YES') . PHP_EOL . PHP_EOL; // ------------------------------------------------------------------------------------ // Burton method $classifier = new Classifier($corpus); $classifier->setMethod(new \PHPAntiSpam\Method\BurtonMethod($corpus)); $spamProbability = $classifier->isSpam('This is spam'); echo 'With Burton method:' . PHP_EOL; echo sprintf('Spam probability: %s', $spamProbability) . PHP_EOL; echo sprintf('Is spam: %s', $spamProbability < 0.9 ? 'NO' : 'YES') . PHP_EOL . PHP_EOL; // ------------------------------------------------------------------------------------ // Robinson Geometric Mean Test Method $classifier = new Classifier($corpus); $classifier->setMethod(new \PHPAntiSpam\Method\RobinsonGeometricMeanTestMethod($corpus)); $spamProbability = $classifier->isSpam('This is spam'); echo 'With Robinson Geometric Mean Test method:' . PHP_EOL; echo sprintf( 'Spam probability: [spamminess: %s; hamminess: %s; combined: %s]', $spamProbability['spamminess'], $spamProbability['hamminess'], $spamProbability['combined'] ) . PHP_EOL; echo sprintf('Is spam: %s', $spamProbability['combined'] <= 0.55 ? 'NO' : 'YES') . PHP_EOL . PHP_EOL; // ------------------------------------------------------------------------------------ // Fisher-Robinson Inverse Chi Square Method $classifier = new Classifier($corpus); $classifier->setMethod(new \PHPAntiSpam\Method\FisherRobinsonInverseChiSquareMethod($corpus)); $spamProbability = $classifier->isSpam('This is spam'); echo 'With Fisher-Robinson Inverse Chi Square method:' . PHP_EOL; echo sprintf( 'Spam probability: [spamminess: %s; hamminess: %s; combined: %s]', $spamProbability['spamminess'], $spamProbability['hamminess'], $spamProbability['combined'] ) . PHP_EOL; echo sprintf('Is spam: %s', $spamProbability['combined'] <= 0.55 ? 'NO' : 'YES') . PHP_EOL;

The Versions

04/09 2016

dev-feat-dump-array-corpus-to-file

dev-feat-dump-array-corpus-to-file

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

30/08 2016

v0.3.x-dev

0.3.9999999.9999999-dev

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

30/08 2016

dev-master

9999999-dev

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

30/08 2016

v0.2.1

0.2.1.0

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

30/08 2016

dev-fix-division-by-zero

dev-fix-division-by-zero

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

30/07 2016

v0.2.0

0.2.0.0

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

30/07 2016

v0.2.0.x-dev

0.2.0.9999999-dev

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

03/04 2016

v0.1.0

0.1.0.0

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian

03/04 2016

dev-cleaning

dev-cleaning

Bayesian spam recognition library

  Sources   Download

MIT

The Development Requires

by Blazej Gruszka

spam antispam bayes statistical analysis bayesian