com.aliasi.tokenizer
Class TokenLengthTokenizerFactory

java.lang.Object
  extended by com.aliasi.tokenizer.ModifiedTokenizerFactory
      extended by com.aliasi.tokenizer.ModifyTokenTokenizerFactory
          extended by com.aliasi.tokenizer.TokenLengthTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Serializable

public class TokenLengthTokenizerFactory
extends ModifyTokenTokenizerFactory
implements Serializable

A TokenLengthTokenizerFactory filters the tokenizers produced by a base tokenizer to only return tokens between specified lower and upper length limits.

Thread Safety

Token-length bounded tokenizer factories are thread safe if their base tokenizers are thread safe.

Serialization

Token-length bounded tokenizer factories may be serialized if their base tokenizers are serializable.

Since:
Lingpipe3.8
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
TokenLengthTokenizerFactory(TokenizerFactory factory, int shortestTokenLength, int longestTokenLength)
          Construct a token-length filtered tokenizer factory from the specified factory that removes tokens shorter than the shortest or longer than the longest length.
 
Method Summary
 String modifyToken(String token)
          Return a tokenizer that filters out any tokens produced by the specified tokenizer that are shorter than the shortest or longer than the longest acceptable lengths.
 
Methods inherited from class com.aliasi.tokenizer.ModifyTokenTokenizerFactory
modify, modifyWhitespace
 
Methods inherited from class com.aliasi.tokenizer.ModifiedTokenizerFactory
baseTokenizerFactory, tokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenLengthTokenizerFactory

public TokenLengthTokenizerFactory(TokenizerFactory factory,
                                   int shortestTokenLength,
                                   int longestTokenLength)
Construct a token-length filtered tokenizer factory from the specified factory that removes tokens shorter than the shortest or longer than the longest length. To effectively remove bounds, use Integer.MIN_VALUE and Integer.MAX_VALUE.

Parameters:
factory - Base tokenizer factory.
shortestTokenLength - Length of shortest acceptable token.
longestTokenLength - Length of longest acceptable token.
Throws:
IllegalArgumentException - If the shortest length is negative, or the shortest length is greater than the longest length.
Method Detail

modifyToken

public String modifyToken(String token)
Return a tokenizer that filters out any tokens produced by the specified tokenizer that are shorter than the shortest or longer than the longest acceptable lengths.

Overrides:
modifyToken in class ModifyTokenTokenizerFactory
Parameters:
token - Input token.
Returns:
The input token if it is an acceptable length and null otherwise.