com.aliasi.classify
Class LogisticRegressionClassifier<E>

java.lang.Object
  extended by com.aliasi.classify.LogisticRegressionClassifier<E>
All Implemented Interfaces:
Classifier<E,ConditionalClassification>, Compilable, Serializable

public class LogisticRegressionClassifier<E>
extends Object
implements Classifier<E,ConditionalClassification>, Compilable, Serializable

A LogisticRegressionClassifier provides conditional probability classifications of input objects using an underlying logistic regression model and feature extractor. Logistic regression is a discrimitive classifier which operates over arbitrary floating-point-valued features of objects

Training

Logistic regression classifiers may be trained from a data corpus using the method train(FeatureExtractor,Corpus,int,boolean,RegressionPrior,AnnealingSchedule,double,int,int,PrintWriter), the last six arguments of which are shared with the logistic regression training method LogisticRegression.estimate(Vector[],int[],RegressionPrior,AnnealingSchedule,double,int,int,PrintWriter). The first three arguments are required to adapt logistic regression to general classification, and consist of a feature extractor, a corpus to train over, and a boolean flag indicating whether or not to add an intercept feature to every input vector.

This class merely acts as an adapter to implement the Classifier interface based on the LogisticRegression class in the statistics package. The basis of the adaptation is a general feature extractor, which is an instance of FeatureExtractor. A feature extractor converts an arbitrary input object (whose type is specified generically in this class) to a mapping from features (represented as strings) to values (represented as instances of Number). The class then uses a symbol table for features to convert the maps from feature names to numbers into sparse vectors, where the dimensions are the identifiers for the features in the symbol table. By convention, if the intercept feature flag is set, it will set dimension 0 of all inputs to 1.0.

For more information on the logistic regression model itself and the training procedure used, see the class documentation for LogisticRegression.

Serialization and Compilation

This class implements both Serializable and Compilable, but both do the same thing and simply write the content of the model to the object output. The model read back in will be an instance of LogisticRegressionClassifier with the same components as the model that was serialized or compiled.

Since:
LingPipe3.5
Version:
3.5
Author:
Bob Carpenter
See Also:
Serialized Form

Method Summary
 List<String> categorySymbols()
          Returns the category symbols used by this classifier.
 ConditionalClassification classify(E in)
          Return the conditional classification of the specified object using logistic regression classification.
 void compileTo(ObjectOutput objOut)
          Compile this classifier to the specified object output.
 SymbolTable featureSymbolTable()
          Returns an unmodifiable view of the symbol table used for features in this classifier.
 ObjectToDoubleMap<String> featureValues(String category)
          Returns a mapping from features to their parameter values for the specified category.
 String toString()
          Returns a string-based representation of this classifier, listing the parameter vectors for each category.
static
<F> LogisticRegressionClassifier<F>
train(FeatureExtractor<? super F> featureExtractor, Corpus<ClassificationHandler<F,Classification>> corpus, int minFeatureCount, boolean addInterceptFeature, RegressionPrior prior, AnnealingSchedule annealingSchedule, double minImprovement, int minEpochs, int maxEpochs, PrintWriter progressWriter)
          Returns a trained logistic regression classifier given the specified feature extractor, corpus, model priors and search parameters.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Method Detail

featureSymbolTable

public SymbolTable featureSymbolTable()
Returns an unmodifiable view of the symbol table used for features in this classifier.

Returns:
The feature symbol table for this classifier.

categorySymbols

public List<String> categorySymbols()
Returns the category symbols used by this classifier. Classifications that this class returns will use only these symbols.

Returns:
The category symbols for this classifier.

classify

public ConditionalClassification classify(E in)
Return the conditional classification of the specified object using logistic regression classification. All categories will have conditional probabilities in results.

Specified by:
classify in interface Classifier<E,ConditionalClassification>
Parameters:
in - Input object to classify.
Returns:
The conditional classification of the object.

compileTo

public void compileTo(ObjectOutput objOut)
               throws IOException
Compile this classifier to the specified object output. This method is only for storage convenience; the classifier read back in from the serialized object will be equivalent to this one (but not in the Object.equals() sense).

Serializing this class produces exactly the same output.

Specified by:
compileTo in interface Compilable
Parameters:
objOut - Object output to which this classifier is written.
Throws:
IOException - If there is an underlying I/O error writing the model to the stream.

featureValues

public ObjectToDoubleMap<String> featureValues(String category)
Returns a mapping from features to their parameter values for the specified category. If the category is the last category, which implicitly has zero values for all parameters, the map returned by this method will also have zero values for all features.

Parameters:
category - Classification category.
Returns:
The map from features to their parameter values for the specified category.
Throws:
IllegalArgumentException - If the category is unknown.

toString

public String toString()
Returns a string-based representation of this classifier, listing the parameter vectors for each category.

Overrides:
toString in class Object
Returns:
A string-based representation of this classifier.

train

public static <F> LogisticRegressionClassifier<F> train(FeatureExtractor<? super F> featureExtractor,
                                                        Corpus<ClassificationHandler<F,Classification>> corpus,
                                                        int minFeatureCount,
                                                        boolean addInterceptFeature,
                                                        RegressionPrior prior,
                                                        AnnealingSchedule annealingSchedule,
                                                        double minImprovement,
                                                        int minEpochs,
                                                        int maxEpochs,
                                                        PrintWriter progressWriter)
                                             throws IOException
Returns a trained logistic regression classifier given the specified feature extractor, corpus, model priors and search parameters.

Only the training section of the specified corpus is used for training.

See the class documentation above and the class documentation for LogisticRegression for more information on the parameters.

Parameters:
featureExtractor - Converter from objects to feature maps.
corpus - Corpus of training data.
minFeatureCount - Minimum count for features in corpus to keep feature as part of model.
addInterceptFeature - A flag set to true if an intercept feature should be added to each input vector.
prior - The prior for regularization of the regression.
annealingSchedule - Class to compute learning rate for each epoch.
minImprovement - Minimum relative improvement in error during an epoch to stop search.
minEpochs - Minimum number of search epochs.
maxEpochs - Maximum number of epochs.
progressWriter - Writer to which progress reports are written. and checks for termination.
Throws:
IOException - If there is an underlying I/O exception reading the data from the corpus.