com.aliasi.classify
Class BigVectorClassifier

java.lang.Object
  extended by com.aliasi.classify.BigVectorClassifier
All Implemented Interfaces:
BaseClassifier<Vector>, Classifier<Vector,ScoredClassification>, RankedClassifier<Vector>, ScoredClassifier<Vector>, Serializable

public class BigVectorClassifier
extends Object
implements Classifier<Vector,ScoredClassification>, ScoredClassifier<Vector>, Serializable

A BigVectorClassifier provides an efficient linear classifier implementation for large numbers of categories. Inputs are vector implementations and outputs are scored classifications pruned to the top N.

Computation

This class reverses what's typically a category (row) dominant approach to a feature (column) dominant representation, allowing scaling to large number of categories when the columns are sparse.

The standard approach in linear classifiers is to multiply a (possibly sparse) input vector by each category's vector representation. The vector representing a category maps features to values, and may be sparse.

This class reverses the representation. Rather than a map from categories to features to values, it uses a map from features to categories to values. For a sparse input, it then iterates over the categories for each feature and adds the results. If the maps from categories to values for features are very sparse, this saves significant time over multiplying the input by each category's vector representation.

This class uses a custom heap to efficiently merge the features for each category, and a bounded priority queue for collecting n-best results.

Input Representation

The constructor takes an array of vectors, one for each dimension, or feature of the linear classifier. Each of these vectors is sparse and has dimensions corresponding to categories with non-zero values for the feature. It thus corresponds to a term/document matrix in search, with terms being features and documents being categories.

Training

There are no training methods provided as part of this class. It is meant as a general utility for importing large category linear classifiers.

Serialization

Instances may be serialized. When read back in they will be members of this class.

Thread Safety

This class is read-write threadsafe, where the only write operation sets the maximum number of results. Thus any number of concurrent classifications may be carried out with a single instance of this class.

Since:
LingPipe3.9
Version:
3.9.1
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
BigVectorClassifier(Vector[] termVectors, int maxResults)
          Construct a big vector classifier with the specified term vectors, maximum number of results, and categories equal to the string representations of the category identifiers.
BigVectorClassifier(Vector[] termVectors, String[] categories, int maxResults)
          Construct a big vector classifier with the specified term vectors, categories, and maximum number of results.
 
Method Summary
 ScoredClassification classify(Vector x)
          Return a scored classification consisting of the top results for the specified vector input.
 int maxResults()
          Return the maximum number of top results returned by this classifier.
 void setMaxResults(int maxResults)
          Sets the maximum number of results returned by this classifier.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BigVectorClassifier

public BigVectorClassifier(Vector[] termVectors,
                           int maxResults)
Construct a big vector classifier with the specified term vectors, maximum number of results, and categories equal to the string representations of the category identifiers.

See BigVectorClassifier(Vector[],String[],int) for more information.

Parameters:
termVectors - Term vectors for classifier.
maxResults - Maximum number of top results returned.

BigVectorClassifier

public BigVectorClassifier(Vector[] termVectors,
                           String[] categories,
                           int maxResults)
Construct a big vector classifier with the specified term vectors, categories, and maximum number of results. The term vectors have category identifiers as

Parameters:
termVectors - Term vectors for classifier.
categories - Category names indexed by number.
maxResults - Maximum number of top results returned.
Method Detail

maxResults

public int maxResults()
Return the maximum number of top results returned by this classifier.

Returns:
Maximum number of results from classification.

setMaxResults

public void setMaxResults(int maxResults)
Sets the maximum number of results returned by this classifier.

This method is a write method which should be read-write synchronized with calls to classify(Vector).

Parameters:
maxResults - Maximum number of top results returned by this classifier.

classify

public ScoredClassification classify(Vector x)
Return a scored classification consisting of the top results for the specified vector input.

The maximum size of the returned scored classification is given by maxResults() and set with setMaxResults(int).

Specified by:
classify in interface BaseClassifier<Vector>
Specified by:
classify in interface Classifier<Vector,ScoredClassification>
Specified by:
classify in interface RankedClassifier<Vector>
Specified by:
classify in interface ScoredClassifier<Vector>
Parameters:
x - Vector to classify.
Returns:
Classification of the vector.