com.aliasi.tokenizer
Class ModifyTokenTokenizerFactory

java.lang.Object
  extended by com.aliasi.tokenizer.ModifiedTokenizerFactory
      extended by com.aliasi.tokenizer.ModifyTokenTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Serializable
Direct Known Subclasses:
LowerCaseTokenizerFactory, PorterStemmerTokenizerFactory, RegExFilteredTokenizerFactory, SoundexTokenizerFactory, StopTokenizerFactory, TokenLengthTokenizerFactory, WhitespaceNormTokenizerFactory

public abstract class ModifyTokenTokenizerFactory
extends ModifiedTokenizerFactory
implements Serializable

The abstract base class ModifyTokenTokenizerFactory adapts token and whitespace modifiers to modify tokenizer factories.

The method modifyToken(String) may be used to modify or remove tokens from tokenizer outputs. The method modifyWhitespace(String) may be used to modify the whitespace returned by a tokenizer. Both methods are given pass-through implementations in this class.

Thread Safety

This tokenizer factory is thread safe if the modify token and modify whitespace implementations are thread safe. The implementations provided here are thread safe.

Since:
Lingpipe3.8
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
ModifyTokenTokenizerFactory(TokenizerFactory factory)
          Construct a token-modifying tokenizer factory with the specified base factory.
 
Method Summary
 Tokenizer modify(Tokenizer tokenizer)
          Return a modified version of the specified tokenizer that modifies tokens and whitespaces as specified by the corresponding string modifier methods.
 String modifyToken(String token)
          Return a modified form of the specified token, or null to remove it.
 String modifyWhitespace(String whitespace)
          Return the modified form of the specified whitespace.
 
Methods inherited from class com.aliasi.tokenizer.ModifiedTokenizerFactory
baseTokenizerFactory, tokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ModifyTokenTokenizerFactory

public ModifyTokenTokenizerFactory(TokenizerFactory factory)
Construct a token-modifying tokenizer factory with the specified base factory.

Parameters:
factory - Base tokenizer factory.
Method Detail

modify

public final Tokenizer modify(Tokenizer tokenizer)
Return a modified version of the specified tokenizer that modifies tokens and whitespaces as specified by the corresponding string modifier methods.

Specified by:
modify in class ModifiedTokenizerFactory
Parameters:
tokenizer - Tokenizer to modify.
Returns:
The modified tokenizer.

modifyToken

public String modifyToken(String token)
Return a modified form of the specified token, or null to remove it.

The base implementation in this class simply returns the specified token.

Parameters:
token - Token to modify.
Returns:
Modified token or null to remove it.

modifyWhitespace

public String modifyWhitespace(String whitespace)
Return the modified form of the specified whitespace.

The base implementation in this class simply returns the specified whitespace.

Parameters:
whitespace - Whitespace to modify.
Returns:
The modified whitespace.