com.aliasi.tokenizer
Class WhitespaceNormTokenizerFactory
java.lang.Object
com.aliasi.tokenizer.ModifiedTokenizerFactory
com.aliasi.tokenizer.ModifyTokenTokenizerFactory
com.aliasi.tokenizer.WhitespaceNormTokenizerFactory
- All Implemented Interfaces:
- TokenizerFactory, Serializable
public class WhitespaceNormTokenizerFactory
- extends ModifyTokenTokenizerFactory
- implements Serializable
A WhitespaceNormTokenizerFactory filters the tokenizers produced
by a base tokenizer factory to convert non-empty whitespaces to a single
space and leave empty (zero-length) whitespaces alone.
Thread Safety
A whitespace normalizing tokenizer factory is thread
safe if its base tokenizer factory is thread safe.
Serialization
A whitespace normalizing tokenizer factory is serializable if its
base tokenizer factory is serializable.
- Since:
- Lingpipe3.8
- Version:
- 4.0.1
- Author:
- Bob Carpenter
- See Also:
- Serialized Form
WhitespaceNormTokenizerFactory
public WhitespaceNormTokenizerFactory(TokenizerFactory factory)
- Construct a whitespace normalizing tokenizer factory from the
specified base factory.
- Parameters:
factory - Base tokenizer factory.
modifyWhitespace
public String modifyWhitespace(String whitespace)
- Return the normalized form of the specified whitespace.
- Overrides:
modifyWhitespace in class ModifyTokenTokenizerFactory
- Parameters:
whitespace - Input whitespace.
- Returns:
- Normalized whitespace.
toString
public String toString()
- Overrides:
toString in class ModifyTokenTokenizerFactory