camspiers/statistical-classifier

A PHP implementation of Complement Naive Bayes and SVM statistical classifiers, including a structure for building other classifier, multiple data sources and multiple caching backends

Maintainers

👁 camspiers

Package info

github.com/camspiers/statistical-classifier

Homepage

pkg:composer/camspiers/statistical-classifier

Statistics

Installs: 37 019

Dependents: 1

Suggesters: 0

Stars: 170

Open Issues: 6

0.8.0 2014-01-05 22:29 UTC

Requires

Requires (Dev)

Suggests

Provides

None

Conflicts

None

Replaces

None

MIT e5e622ade4db6f3be4b4ec72c507926800e91687

  • Cam Spiers <camspiers.woop@gmail.com>

bayesclassifiernaivesvm

This package is auto-updated.

Last update: 2026-06-19 17:44:59 UTC


README

👁 Build Status
👁 Latest Stable Version

PHP Classifier uses semantic versioning, it is currently at major version 0, so the public API should not be considered stable.

What is it?

PHP Classifier is a text classification library with a focus on reuse, customizability and performance. Classifiers can be used for many purposes, but are particularly useful in detecting spam.

Features

  • Complement Naive Bayes Classifier
  • SVM (libsvm) Classifier
  • Highly customizable (easily modify or build your own classifier)
  • Command-line interface via separate library (phar archive)
  • Multiple data import types to get your data into the classifier (Directory of files, Database queries, Json, Serialized arrays)
  • Multiple types of model caching
  • Compatible with HipHop VM

Installation

$ composer require camspiers/statistical-classifier

SVM Support

For SVM Support both libsvm and php-svm are required. For installation intructions refer to php-svm.

Usage

Non-cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new ComplementNaiveBayes($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Non-cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray()
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$classifier = new SVM($source);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Caching models

Caching models requires maximebf/CacheCache which can be installed via packagist. Additional caching systems can be easily integrated.

Cached Naive Bayes

use Camspiers\StatisticalClassifier\Classifier\ComplementNaiveBayes;
use Camspiers\StatisticalClassifier\Model\CachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new CachedModel(
	'mycachename',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new ComplementNaiveBayes($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Cached SVM

use Camspiers\StatisticalClassifier\Classifier\SVM;
use Camspiers\StatisticalClassifier\Model\SVMCachedModel;
use Camspiers\StatisticalClassifier\DataSource\DataArray;

$source = new DataArray();
$source->addDocument('spam', 'Some spam document');
$source->addDocument('spam', 'Another spam document');
$source->addDocument('ham', 'Some ham document');
$source->addDocument('ham', 'Another ham document');

$model = new Model\SVMCachedModel(
	__DIR__ . '/model.svm',
	new CacheCache\Cache(
		new CacheCache\Backends\File(
			array(
				'dir' => __DIR__
			)
		)
	)
);

$classifier = new SVM($source, $model);
$classifier->is('ham', 'Some ham document'); // bool(true)
$classifier->classify('Some ham document'); // string "ham"

Unit testing

statistical-classifier/ $ composer install --dev
statistical-classifier/ $ phpunit