Class StringLengthFeatureExtractor

  extended by com.aliasi.features.StringLengthFeatureExtractor
All Implemented Interfaces:
FeatureExtractor<CharSequence>, Serializable

public class StringLengthFeatureExtractor
extends Object
implements FeatureExtractor<CharSequence>, Serializable

A StringLengthFeatureExtractor implements a feature extractor that provides string length features based on a specified set of string lengths.

Each specified length will become a feature with value 1.0 if the string is greater than or equal to the specified length. For example, if the specified lengths were {1,2,3}, the string "ab" would produce feature map LEN>=1:1.0, LEN>=2:1.0.

A length of 0 will always produce the feature mapping LEN>=0:1.0, which is redundant if there is an intercept in (constant feature) in the relevant problem. If not, it is tantamount to adding one. Note that intercept features added this way are subject to priors and not treated separately like an intercept always added as the first feature.

Thread Safety

A string-length feature extractor is thread safe.


A string-length feature extractor may be serialized. The deserialized extractor will be an instance of this class.

Bob Carpenter
See Also:
Serialized Form

Constructor Summary
StringLengthFeatureExtractor(int... lengths)
          Construct a string-length feature extractor based on the specified lengths.
Method Summary
 Map<String,? extends Number> features(CharSequence in)
          Return the feature vector for the specified input.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public StringLengthFeatureExtractor(int... lengths)
Construct a string-length feature extractor based on the specified lengths.

lengths - Array (or varargs) of lengths.
IllegalArgumentException - If there is not at least one length or if any of the lengths are less than zero.
Method Detail


public Map<String,? extends Number> features(CharSequence in)
Description copied from interface: FeatureExtractor
Return the feature vector for the specified input.

Specified by:
features in interface FeatureExtractor<CharSequence>
in - Input object.
The feature vector for the specified input.