com.aliasi.classify
Class BernoulliClassifier<E>

java.lang.Object
  extended by com.aliasi.classify.BernoulliClassifier<E>
Type Parameters:
E - the type of object classified
All Implemented Interfaces:
Classifier<E,JointClassification>, ClassificationHandler<E,Classification>, Handler, Serializable

public class BernoulliClassifier<E>
extends Object
implements Classifier<E,JointClassification>, ClassificationHandler<E,Classification>, Serializable

A BernoulliClassifier provides a feature-based classifier where feature values are reduced to booleans based on a specified threshold. Training events are supplied in the usual way through the handle(Object,Classification) method.

Given a feature threshold of t, any feature with value strictly greater than the threshold t for a given input is activated, and all other features are not activated for that input.

The likelihood of a feature in a category is estimated with the training sample counts using add-one smoothing (also known as Laplace smoothing, or a uniform Dirichlet prior). There is also a term for the category distribution. Suppose F is the complete set of features seen during training. Further suppose that count(cat) is the number of training samples for category cat, and count(cat,feat) is the number of training instaces of the specified category that had the specified feature activated. Thus the contribution of each feature is computed by:

 p(+feat|cat) = (count(cat,feat) + 1) / (count(cat)+2)
 p(-feat|cat) = 1.0 - p(cat,feat)

Assuming the total number of training instances is totalCount, we use a simple maximum-likelihood estimate for the category probability:

 p(cat) = count(cat) / totalCount
With these two definitions, we define the joint probability estimate for a category cat given activated features {f[0],...,f[n-1]} and unactivated features {g[0],...,g[m-1]} is:
 p(cat,{f[0],...f[n-1]})
   = p(cat)
   * Πi < n p(f[i]|cat)
   * Πj < m p(-g[j]|cat)

The JointClassification class requires log (base 2) estimates, and is responsible for converting these to conditional estimates. The scores in this case are just the log2 joint estimates.

The dynamic form of the estimator may be used for classification, but it is not very efficient. It loops over every feature for every category.

Serialization and Compilation

The serialized version of a Bernoulli classifier will deserialize as an equivalent instance of BernoulliClassifier. In order to serialize a Bernoulli classifier, the feature extractor must be serializable. Otherwise an exception will be raised during serialization.

Compilation is not yet implemented.

Since:
LingPipe3.1
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
BernoulliClassifier(FeatureExtractor<E> featureExtractor)
          Construct a Bernoulli classifier with the specified feature extractor and the default feature activation threshold of 0.0.
BernoulliClassifier(FeatureExtractor<E> featureExtractor, double featureActivationThreshold)
          Construct a Bernoulli classifier with the specified feature extractor and specified feature activation threshold.
 
Method Summary
 String[] categories()
          Returns the categories for this classifier.
 JointClassification classify(E input)
          Classify the specified input using this Bernoulli classifier.
 void handle(E input, Classification classification)
          Handle the specified training event, consisting of an input and its first-best classification.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BernoulliClassifier

public BernoulliClassifier(FeatureExtractor<E> featureExtractor)
Construct a Bernoulli classifier with the specified feature extractor and the default feature activation threshold of 0.0.

Parameters:
featureExtractor - Feature extractor for classification.

BernoulliClassifier

public BernoulliClassifier(FeatureExtractor<E> featureExtractor,
                           double featureActivationThreshold)
Construct a Bernoulli classifier with the specified feature extractor and specified feature activation threshold.

Parameters:
featureExtractor - Feature extractor for classification.
featureActivationThreshold - The threshold for feature activation (see the class documentation).
Method Detail

categories

public String[] categories()
Returns the categories for this classifier.

Returns:
The categories for this classifier.

handle

public void handle(E input,
                   Classification classification)
Handle the specified training event, consisting of an input and its first-best classification.

Specified by:
handle in interface ClassificationHandler<E,Classification>
Parameters:
input - Object whose classification result is being trained.
classification - Classification result for object.

classify

public JointClassification classify(E input)
Classify the specified input using this Bernoulli classifier. See the class documentation above for mathematical details.

Specified by:
classify in interface Classifier<E,JointClassification>
Parameters:
input - Input to classify.
Returns:
Classification of the specified input.