com.aliasi.tokenizer
Class TokenFeatureExtractor
java.lang.Object
com.aliasi.tokenizer.TokenFeatureExtractor
- All Implemented Interfaces:
- FeatureExtractor<CharSequence>, Serializable
public class TokenFeatureExtractor
- extends Object
- implements FeatureExtractor<CharSequence>, Serializable
A TokenFeatureExtractor produces feature vectors from
character sequences representing token counts.
Serialization
The token feature extractors implement the Serializable
interface. A token feature extractor will actually be serializable
if the underlying tokenizer factory is serializable, either by
implementing the Serializable interface or the Compilable interface. If it is not, attempting to serialize the
feature extractor will throw an exception.
- Since:
- LingPipe3.1
- Version:
- 3.1.3
- Author:
- Bob Carpenter
- See Also:
- Serialized Form
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TokenFeatureExtractor
public TokenFeatureExtractor(TokenizerFactory factory)
- Construct a token-based feature extractor from the
specified tokenizer factory.
- Parameters:
factory - Tokenizer factory to use for tokenization.
features
public Map<String,Counter> features(CharSequence in)
- Return the feature vector for the specified character sequence.
The keys are the tokens extracted and their values is the count
of the token in the input character sequence.
- Specified by:
features in interface FeatureExtractor<CharSequence>
- Parameters:
in - Character sequence from which to extract features.
- Returns:
- Mapping from tokens in the input sequence to their
counts.