com.aliasi.tokenizer
Class RegExFilteredTokenizerFactory

java.lang.Object
  extended by com.aliasi.tokenizer.ModifiedTokenizerFactory
      extended by com.aliasi.tokenizer.ModifyTokenTokenizerFactory
          extended by com.aliasi.tokenizer.RegExFilteredTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Serializable

public class RegExFilteredTokenizerFactory
extends ModifyTokenTokenizerFactory
implements Serializable

A RegExFilteredTokenizerFactory modifies the tokens returned by a base tokenizer factory's tokizer by removing those that do not match a regular expression pattern.

Thread Safety

A regular expression filtered tokenizer factory is thread safe if its base tokenizer factory is thread safe. The pattern for this filter is used to create a Matcher for each token. If the matcher matches, that is, if Matcher.matches() returns true, then the token is kept; otherwise, the token is removed.

Serialization

A regular expression filtered tokenizer factory is serializable if its base tokenizer factory is serializable.

Since:
Lingpipe3.8
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
RegExFilteredTokenizerFactory(TokenizerFactory factory, Pattern pattern)
          Construct a regular-expression filtered tokenizer factory from the specified base factory and regular expression pattern that accepted tokens must match.
 
Method Summary
 String modifyToken(String token)
          Returns the specified token if it matches this filter's pattern and null otherwise.
 
Methods inherited from class com.aliasi.tokenizer.ModifyTokenTokenizerFactory
modify, modifyWhitespace
 
Methods inherited from class com.aliasi.tokenizer.ModifiedTokenizerFactory
baseTokenizerFactory, tokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegExFilteredTokenizerFactory

public RegExFilteredTokenizerFactory(TokenizerFactory factory,
                                     Pattern pattern)
Construct a regular-expression filtered tokenizer factory from the specified base factory and regular expression pattern that accepted tokens must match.

Parameters:
factory - Base tokenizer factory.
pattern - Pattern to match against tokens.
Method Detail

modifyToken

public String modifyToken(String token)
Returns the specified token if it matches this filter's pattern and null otherwise.

Overrides:
modifyToken in class ModifyTokenTokenizerFactory
Parameters:
token - Input token.
Returns:
The input token if it matches, and null otherwise.