com.aliasi.tokenizer
Class TokenFeatureExtractor

java.lang.Object
  extended by com.aliasi.tokenizer.TokenFeatureExtractor
All Implemented Interfaces:
FeatureExtractor<CharSequence>, Serializable

public class TokenFeatureExtractor
extends Object
implements FeatureExtractor<CharSequence>, Serializable

A TokenFeatureExtractor produces feature vectors from character sequences representing token counts.

Serialization

The token feature extractors implement the Serializable interface. A token feature extractor will actually be serializable if the underlying tokenizer factory is serializable, either by implementing the Serializable interface or the Compilable interface. If it is not, attempting to serialize the feature extractor will throw an exception.

Since:
LingPipe3.1
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
TokenFeatureExtractor(TokenizerFactory factory)
          Construct a token-based feature extractor from the specified tokenizer factory.
 
Method Summary
 Map<String,Counter> features(CharSequence in)
          Return the feature vector for the specified character sequence.
 String toString()
          Returns a description of this token feature extractor including its contained tokenizer factory.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TokenFeatureExtractor

public TokenFeatureExtractor(TokenizerFactory factory)
Construct a token-based feature extractor from the specified tokenizer factory.

Parameters:
factory - Tokenizer factory to use for tokenization.
Method Detail

features

public Map<String,Counter> features(CharSequence in)
Return the feature vector for the specified character sequence. The keys are the tokens extracted and their values is the count of the token in the input character sequence.

Specified by:
features in interface FeatureExtractor<CharSequence>
Parameters:
in - Character sequence from which to extract features.
Returns:
Mapping from tokens in the input sequence to their counts.

toString

public String toString()
Returns a description of this token feature extractor including its contained tokenizer factory. This method calls the toString() method of the contained tokenizer factory.

Overrides:
toString in class Object
Returns:
A description of this token feature extractor and its contained tokenizer factory.