com.aliasi.tokenizer
Class StopTokenizerFactory
java.lang.Object
com.aliasi.tokenizer.ModifiedTokenizerFactory
com.aliasi.tokenizer.ModifyTokenTokenizerFactory
com.aliasi.tokenizer.StopTokenizerFactory
- All Implemented Interfaces:
- TokenizerFactory, Serializable
- Direct Known Subclasses:
- EnglishStopTokenizerFactory
public class StopTokenizerFactory
- extends ModifyTokenTokenizerFactory
- implements Serializable
A StopTokenizerFactory modifies a base tokenizer factory
by removing tokens in a specified stop set. When a token is
removed from the output of a tokenizer, so is the whitespace
immediately following it.
Thread Safety
A stopped tokenizer factory is thread safe if its base
tokenizer factory is thread safe.
Serialization
A stopped tokenizer factory is serializable if its base
tokenizer factory is serializable.
- Since:
- Lingpipe3.8
- Version:
- 4.0.1
- Author:
- Bob Carpenter
- See Also:
- Serialized Form
StopTokenizerFactory
public StopTokenizerFactory(TokenizerFactory factory,
Set<String> stopSet)
- Construct a tokenizer factory that removes tokens
in the specified stop set from tokenizers produced
by the specified base factory.
- Parameters:
factory - Base tokenizer factory.stopSet - Set of stop tokens.
stopSet
public Set<String> stopSet()
- Returns an unmodifiable view of the stop set
underlying this stop tokenizer factory.
- Returns:
- The stop set for this factory.
modifyToken
public String modifyToken(String token)
- Description copied from class:
ModifyTokenTokenizerFactory
- Return a modified form of the specified token, or
null to remove it.
The base implementation in this class simply
returns the specified token.
- Overrides:
modifyToken in class ModifyTokenTokenizerFactory
- Parameters:
token - Token to modify.
- Returns:
- Modified token or
null to remove it.
toString
public String toString()
- Overrides:
toString in class ModifyTokenTokenizerFactory