com.aliasi.tokenizer
Class StopTokenizerFactory

java.lang.Object
  extended by com.aliasi.tokenizer.ModifiedTokenizerFactory
      extended by com.aliasi.tokenizer.ModifyTokenTokenizerFactory
          extended by com.aliasi.tokenizer.StopTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Serializable
Direct Known Subclasses:
EnglishStopTokenizerFactory

public class StopTokenizerFactory
extends ModifyTokenTokenizerFactory
implements Serializable

A StopTokenizerFactory modifies a base tokenizer factory by removing tokens in a specified stop set. When a token is removed from the output of a tokenizer, so is the whitespace immediately following it.

Thread Safety

A stopped tokenizer factory is thread safe if its base tokenizer factory is thread safe.

Serialization

A stopped tokenizer factory is serializable if its base tokenizer factory is serializable.

Since:
Lingpipe3.8
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
StopTokenizerFactory(TokenizerFactory factory, Set<String> stopSet)
          Construct a tokenizer factory that removes tokens in the specified stop set from tokenizers produced by the specified base factory.
 
Method Summary
 String modifyToken(String token)
          Return a modified form of the specified token, or null to remove it.
 Set<String> stopSet()
          Returns an unmodifiable view of the stop set underlying this stop tokenizer factory.
 
Methods inherited from class com.aliasi.tokenizer.ModifyTokenTokenizerFactory
modify, modifyWhitespace
 
Methods inherited from class com.aliasi.tokenizer.ModifiedTokenizerFactory
baseTokenizerFactory, tokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StopTokenizerFactory

public StopTokenizerFactory(TokenizerFactory factory,
                            Set<String> stopSet)
Construct a tokenizer factory that removes tokens in the specified stop set from tokenizers produced by the specified base factory.

Parameters:
factory - Base tokenizer factory.
stopSet - Set of stop tokens.
Method Detail

stopSet

public Set<String> stopSet()
Returns an unmodifiable view of the stop set underlying this stop tokenizer factory.

Returns:
The stop set for this factory.

modifyToken

public String modifyToken(String token)
Description copied from class: ModifyTokenTokenizerFactory
Return a modified form of the specified token, or null to remove it.

The base implementation in this class simply returns the specified token.

Overrides:
modifyToken in class ModifyTokenTokenizerFactory
Parameters:
token - Token to modify.
Returns:
Modified token or null to remove it.