com.aliasi.features
Class ZScoreFeatureExtractor<E>

java.lang.Object
  extended by com.aliasi.features.FeatureExtractorFilter<E>
      extended by com.aliasi.features.ZScoreFeatureExtractor<E>
Type Parameters:
E - The type of object whose features are extracted.
All Implemented Interfaces:
FeatureExtractor<E>, Serializable

public class ZScoreFeatureExtractor<E>
extends FeatureExtractorFilter<E>
implements Serializable

A ZScoreFeatureExtractor converts features to their z-scores, where means and deviations are determined by a corpus supplied at compile time.

Means and standard deviations are computed for each feature in the training section of the corpus supplied to the constructor.

At run time, feature values are converted to z-scores, by:

 z(feat,val) = (val - mean(feat))/stdDev(feat)
where feat is the feature, val is the value to be converted to a z-score, mean(feat) is the mean (average) of the feature in the training corpus, and stdDev(feat) is the standard deviation of the feature in the training course.

Z-score normalization ensures that the collection of each feature's values has zero mean and unit standard deviation over the training section of the training corpus. This does not guarantee zero means and unit standard deviation over the test section of the corpus.

Constant (Zero Deviation) Features

If a feature is unseen or has zero standard deviation in the training corpus, it is removed from all output. A feature only has zero standard deviation if it has the same value every time it occurs. For instance, all features seen only once will have zero variance. Effectively, features which always have the same value in the training set will be eliminated from future consideration.

Sparseness

Applying a z-score transform to features destroys sparseness. Undefined features implicitly have value zero, but the z-score of 0 is non-zero if the mean of the feature values is non-zero.

Serialization

A length-norm feature extractor is serializable if its base feature extractor is serializable.

Since:
Lingpipe3.8
Version:
3.9.1
Author:
Mike Ross, Bob Carpenter
See Also:
Serialized Form

Constructor Summary
ZScoreFeatureExtractor(Corpus<ObjectHandler<Classified<E>>> corpus, FeatureExtractor<? super E> extractor)
          Construct a z-core feature extractor from the specified base feature extractor and the training section of the supplied corpus.
ZScoreFeatureExtractor(FeatureExtractor<? super E> extractor, Corpus<ClassificationHandler<E,Classification>> corpus)
          Deprecated. Use constructor ZScoreFeatureExtractor(FeatureExtractor,Corpus) instead.
 
Method Summary
 Map<String,? extends Number> features(E in)
          Return the feature map resulting from converting the feature map produced by the underlying feature extractor to z-scores.
 Number filter(String feature, Number value)
          Deprecated. Use zScore(String,double) instead; this method no longer overrides the the method of the same name in ModifiedFeatureExtractor because this class no longer overrides ModifiedFeatureExtractor.
 Set<String> knownFeatures()
          Returns an unmodifiable view of the known features for this z-score feature extractor.
 double mean(String feature)
          Returns the mean for the specified feature, or Double.NaN if the feature is not known.
 double standardDeviation(String feature)
          Returns the standard deviation for the specified feature, or Double.NaN if the feature is not known.
 String toString()
          Returns a string representation of this z-score feature extractor, listing the mean and deviation for each feature.
 double zScore(String feature, double value)
          Return the z-score for the specified feature and value.
 
Methods inherited from class com.aliasi.features.FeatureExtractorFilter
baseExtractor
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ZScoreFeatureExtractor

@Deprecated
public ZScoreFeatureExtractor(FeatureExtractor<? super E> extractor,
                                         Corpus<ClassificationHandler<E,Classification>> corpus)
                       throws IOException
Deprecated. Use constructor ZScoreFeatureExtractor(FeatureExtractor,Corpus) instead.

Construct a z-core feature extractor from the specified base feature extractor and the training section of the supplied corpus.

Parameters:
extractor - Base feature extractor.
corpus - The corpus whose training section will be visited
Throws:
IOException - If there is an I/O error visting the corpus.

ZScoreFeatureExtractor

public ZScoreFeatureExtractor(Corpus<ObjectHandler<Classified<E>>> corpus,
                              FeatureExtractor<? super E> extractor)
                       throws IOException
Construct a z-core feature extractor from the specified base feature extractor and the training section of the supplied corpus.

Parameters:
extractor - Base feature extractor.
corpus - The corpus whose training section will be visited
Throws:
IOException - If there is an I/O error visting the corpus.
Method Detail

features

public Map<String,? extends Number> features(E in)
Return the feature map resulting from converting the feature map produced by the underlying feature extractor to z-scores. See the class documentation above for definition.

Specified by:
features in interface FeatureExtractor<E>
Overrides:
features in class FeatureExtractorFilter<E>
Parameters:
in - Input object.
Returns:
Feature map for the input object.

filter

@Deprecated
public Number filter(String feature,
                                Number value)
Deprecated. Use zScore(String,double) instead; this method no longer overrides the the method of the same name in ModifiedFeatureExtractor because this class no longer overrides ModifiedFeatureExtractor.

Return the z-score of the value as determined by the mean and deviation of the specified feature in the training corpus.

Returns:
The z-score normalized feature vector.

zScore

public double zScore(String feature,
                     double value)
Return the z-score for the specified feature and value. See the class documentation above for definitions.

Parameters:
feature - Feature name.
value - Value of feature.
Returns:
The z-score of the value for the specified feature.

mean

public double mean(String feature)
Returns the mean for the specified feature, or Double.NaN if the feature is not known.

Parameters:
feature - Feature whose mean is returned.
Returns:
Mean for the specified feature.

standardDeviation

public double standardDeviation(String feature)
Returns the standard deviation for the specified feature, or Double.NaN if the feature is not known.

Parameters:
feature - Feature whose standard deviation is returned.
Returns:
Standard deviation for the specified feature.

knownFeatures

public Set<String> knownFeatures()
Returns an unmodifiable view of the known features for this z-score feature extractor.

Returns:
The set of known features for this extractor.

toString

public String toString()
Returns a string representation of this z-score feature extractor, listing the mean and deviation for each feature.

Overrides:
toString in class Object
Returns:
String representation of this extractor.