com.aliasi.tokenizer
Class RegExFilteredTokenizerFactory
java.lang.Object
com.aliasi.tokenizer.ModifiedTokenizerFactory
com.aliasi.tokenizer.ModifyTokenTokenizerFactory
com.aliasi.tokenizer.RegExFilteredTokenizerFactory
- All Implemented Interfaces:
- TokenizerFactory, Serializable
public class RegExFilteredTokenizerFactory
- extends ModifyTokenTokenizerFactory
- implements Serializable
A RegExFilteredTokenizerFactory modifies the tokens
returned by a base tokenizer factory's tokizer by removing
those that do not match a regular expression pattern.
Thread Safety
A regular expression filtered tokenizer factory is thread safe if
its base tokenizer factory is thread safe. The pattern for
this filter is used to create a Matcher
for each token. If the matcher matches, that is, if
Matcher.matches() returns true,
then the token is kept; otherwise, the token is removed.
Serialization
A regular expression filtered tokenizer factory is serializable if its
base tokenizer factory is serializable.
- Since:
- Lingpipe3.8
- Version:
- 4.0.1
- Author:
- Bob Carpenter
- See Also:
- Serialized Form
RegExFilteredTokenizerFactory
public RegExFilteredTokenizerFactory(TokenizerFactory factory,
Pattern pattern)
- Construct a regular-expression filtered tokenizer factory from
the specified base factory and regular expression pattern that
accepted tokens must match.
- Parameters:
factory - Base tokenizer factory.pattern - Pattern to match against tokens.
getPattern
public Pattern getPattern()
- Returns the pattern for this regex-filtered tokenizer.
- Returns:
- The pattern for this regex-filtered tokenizer.
modifyToken
public String modifyToken(String token)
- Returns the specified token if it matches this
filter's pattern and
null otherwise.
- Overrides:
modifyToken in class ModifyTokenTokenizerFactory
- Parameters:
token - Input token.
- Returns:
- The input token if it matches, and
null
otherwise.
toString
public String toString()
- Overrides:
toString in class ModifyTokenTokenizerFactory