com.aliasi.tokenizer
Class TokenLengthTokenizerFactory
java.lang.Object
com.aliasi.tokenizer.ModifiedTokenizerFactory
com.aliasi.tokenizer.ModifyTokenTokenizerFactory
com.aliasi.tokenizer.TokenLengthTokenizerFactory
- All Implemented Interfaces:
- TokenizerFactory, Serializable
public class TokenLengthTokenizerFactory
- extends ModifyTokenTokenizerFactory
- implements Serializable
A TokenLengthTokenizerFactory filters the tokenizers produced
by a base tokenizer to only return tokens between specified lower and
upper length limits.
Thread Safety
Token-length bounded tokenizer factories are thread safe if their
base tokenizers are thread safe.
Serialization
Token-length bounded tokenizer factories may be serialized if their
base tokenizers are serializable.
- Since:
- Lingpipe3.8
- Version:
- 4.0.1
- Author:
- Bob Carpenter
- See Also:
- Serialized Form
|
Constructor Summary |
TokenLengthTokenizerFactory(TokenizerFactory factory,
int shortestTokenLength,
int longestTokenLength)
Construct a token-length filtered tokenizer factory from the
specified factory that removes tokens shorter than the shortest
or longer than the longest length. |
|
Method Summary |
String |
modifyToken(String token)
Return a tokenizer that filters out any tokens produced by the specified
tokenizer that are shorter than the shortest or longer than the longest
acceptable lengths. |
String |
toString()
|
TokenLengthTokenizerFactory
public TokenLengthTokenizerFactory(TokenizerFactory factory,
int shortestTokenLength,
int longestTokenLength)
- Construct a token-length filtered tokenizer factory from the
specified factory that removes tokens shorter than the shortest
or longer than the longest length. To effectively remove
bounds, use
Integer.MIN_VALUE and Integer.MAX_VALUE.
- Parameters:
factory - Base tokenizer factory.shortestTokenLength - Length of shortest acceptable token.longestTokenLength - Length of longest acceptable token.
- Throws:
IllegalArgumentException - If the shortest length is negative, or
the shortest length is greater than the longest length.
modifyToken
public String modifyToken(String token)
- Return a tokenizer that filters out any tokens produced by the specified
tokenizer that are shorter than the shortest or longer than the longest
acceptable lengths.
- Overrides:
modifyToken in class ModifyTokenTokenizerFactory
- Parameters:
token - Input token.
- Returns:
- The input token if it is an acceptable length and
null otherwise.
toString
public String toString()
- Overrides:
toString in class ModifyTokenTokenizerFactory