com.aliasi.tokenizer
Class RegExFilteredTokenizerFactory
java.lang.Object
com.aliasi.tokenizer.ModifiedTokenizerFactory
com.aliasi.tokenizer.ModifyTokenTokenizerFactory
com.aliasi.tokenizer.RegExFilteredTokenizerFactory
- All Implemented Interfaces:
- TokenizerFactory, Serializable
public class RegExFilteredTokenizerFactory
- extends ModifyTokenTokenizerFactory
- implements Serializable
A RegExFilteredTokenizerFactory modifies the tokens
returned by a base tokenizer factory's tokizer by removing
those that do not match a regular expression pattern.
Thread Safety
A regular expression filtered tokenizer factory is thread safe if
its base tokenizer factory is thread safe. The pattern for
this filter is used to create a Matcher
for each token. If the matcher matches, that is, if
Matcher.matches() returns true,
then the token is kept; otherwise, the token is removed.
Serialization
A regular expression filtered tokenizer factory is serializable if its
base tokenizer factory is serializable.
- Since:
- Lingpipe3.8
- Version:
- 3.8
- Author:
- Bob Carpenter
- See Also:
- Serialized Form
|
Method Summary |
String |
modifyToken(String token)
Returns the specified token if it matches this
filter's pattern and null otherwise. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RegExFilteredTokenizerFactory
public RegExFilteredTokenizerFactory(TokenizerFactory factory,
Pattern pattern)
- Construct a regular-expression filtered tokenizer factory from
the specified base factory and regular expression pattern that
accepted tokens must match.
- Parameters:
factory - Base tokenizer factory.pattern - Pattern to match against tokens.
modifyToken
public String modifyToken(String token)
- Returns the specified token if it matches this
filter's pattern and
null otherwise.
- Overrides:
modifyToken in class ModifyTokenTokenizerFactory
- Parameters:
token - Input token.
- Returns:
- The input token if it matches, and
null
otherwise.