com.aliasi.classify
Class BernoulliClassifier<E>

java.lang.Object
  extended by com.aliasi.classify.BernoulliClassifier<E>
Type Parameters:
E - the type of object classified
All Implemented Interfaces:
BaseClassifier<E>, Classifier<E,JointClassification>, ConditionalClassifier<E>, JointClassifier<E>, RankedClassifier<E>, ScoredClassifier<E>, ClassificationHandler<E,Classification>, Handler, ObjectHandler<Classified<E>>, Serializable

public class BernoulliClassifier<E>
extends Object
implements ClassificationHandler<E,Classification>, Classifier<E,JointClassification>, JointClassifier<E>, ObjectHandler<Classified<E>>, Serializable

A BernoulliClassifier provides a feature-based classifier where feature values are reduced to booleans based on a specified threshold. Training events are supplied in the usual way through the handle(Classified) method.

Given a feature threshold of t, any feature with value strictly greater than the threshold t for a given input is activated, and all other features are not activated for that input.

The likelihood of a feature in a category is estimated with the training sample counts using add-one smoothing (also known as Laplace smoothing, or a uniform Dirichlet prior). There is also a term for the category distribution. Suppose F is the complete set of features seen during training. Further suppose that count(cat) is the number of training samples for category cat, and count(cat,feat) is the number of training instaces of the specified category that had the specified feature activated. Thus the contribution of each feature is computed by:

 p(+feat|cat) = (count(cat,feat) + 1) / (count(cat)+2)
 p(-feat|cat) = 1.0 - p(cat,feat)

Assuming the total number of training instances is totalCount, we use a simple maximum-likelihood estimate for the category probability:

 p(cat) = count(cat) / totalCount
With these two definitions, we define the joint probability estimate for a category cat given activated features {f[0],...,f[n-1]} and unactivated features {g[0],...,g[m-1]} is:
 p(cat,{f[0],...f[n-1]})
   = p(cat)
   * Πi < n p(f[i]|cat)
   * Πj < m p(-g[j]|cat)

The JointClassification class requires log (base 2) estimates, and is responsible for converting these to conditional estimates. The scores in this case are just the log2 joint estimates.

The dynamic form of the estimator may be used for classification, but it is not very efficient. It loops over every feature for every category.

Serialization and Compilation

The serialized version of a Bernoulli classifier will deserialize as an equivalent instance of BernoulliClassifier. In order to serialize a Bernoulli classifier, the feature extractor must be serializable. Otherwise an exception will be raised during serialization.

Compilation is not yet implemented.

Since:
LingPipe3.1
Version:
3.9.1
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
BernoulliClassifier(FeatureExtractor<E> featureExtractor)
          Construct a Bernoulli classifier with the specified feature extractor and the default feature activation threshold of 0.0.
BernoulliClassifier(FeatureExtractor<E> featureExtractor, double featureActivationThreshold)
          Construct a Bernoulli classifier with the specified feature extractor and specified feature activation threshold.
 
Method Summary
 String[] categories()
          Returns a copy of the list the categories for this classifier.
 JointClassification classify(E input)
          Classify the specified input using this Bernoulli classifier.
 double featureActivationThreshold()
          Returns the feature activation threshold.
 FeatureExtractor<E> featureExtractor()
          Return the feature extractor for this classifier.
 void handle(Classified<E> classified)
          Handle the specified training classified object.
 void handle(E input, Classification classification)
          Deprecated. Use handle(Classified) instead.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BernoulliClassifier

public BernoulliClassifier(FeatureExtractor<E> featureExtractor)
Construct a Bernoulli classifier with the specified feature extractor and the default feature activation threshold of 0.0.

Parameters:
featureExtractor - Feature extractor for classification.

BernoulliClassifier

public BernoulliClassifier(FeatureExtractor<E> featureExtractor,
                           double featureActivationThreshold)
Construct a Bernoulli classifier with the specified feature extractor and specified feature activation threshold.

Parameters:
featureExtractor - Feature extractor for classification.
featureActivationThreshold - The threshold for feature activation (see the class documentation).
Method Detail

featureActivationThreshold

public double featureActivationThreshold()
Returns the feature activation threshold.

Returns:
The feature activation threshold for this classifier.

featureExtractor

public FeatureExtractor<E> featureExtractor()
Return the feature extractor for this classifier.

Returns:
The feature extractor for this classifier.

categories

public String[] categories()
Returns a copy of the list the categories for this classifier.

Returns:
The categories for this classifier.

handle

public void handle(Classified<E> classified)
Handle the specified training classified object.

Specified by:
handle in interface ObjectHandler<Classified<E>>
Parameters:
classified - Classified object to add to handle as training data.

handle

@Deprecated
public void handle(E input,
                              Classification classification)
Deprecated. Use handle(Classified) instead.

Handle the specified training event, consisting of an input and its first-best classification.

Specified by:
handle in interface ClassificationHandler<E,Classification>
Parameters:
input - Object whose classification result is being trained.
classification - Classification result for object.

classify

public JointClassification classify(E input)
Classify the specified input using this Bernoulli classifier. See the class documentation above for mathematical details.

Specified by:
classify in interface BaseClassifier<E>
Specified by:
classify in interface Classifier<E,JointClassification>
Specified by:
classify in interface ConditionalClassifier<E>
Specified by:
classify in interface JointClassifier<E>
Specified by:
classify in interface RankedClassifier<E>
Specified by:
classify in interface ScoredClassifier<E>
Parameters:
input - Input to classify.
Returns:
Classification of the specified input.