com.aliasi.tokenizer
Class WhitespaceNormTokenizerFactory

java.lang.Object
  extended by com.aliasi.tokenizer.ModifiedTokenizerFactory
      extended by com.aliasi.tokenizer.ModifyTokenTokenizerFactory
          extended by com.aliasi.tokenizer.WhitespaceNormTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Serializable

public class WhitespaceNormTokenizerFactory
extends ModifyTokenTokenizerFactory
implements Serializable

A WhitespaceNormTokenizerFactory filters the tokenizers produced by a base tokenizer factory to convert non-empty whitespaces to a single space and leave empty (zero-length) whitespaces alone.

Thread Safety

A whitespace normalizing tokenizer factory is thread safe if its base tokenizer factory is thread safe.

Serialization

A whitespace normalizing tokenizer factory is serializable if its base tokenizer factory is serializable.

Since:
Lingpipe3.8
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
WhitespaceNormTokenizerFactory(TokenizerFactory factory)
          Construct a whitespace normalizing tokenizer factory from the specified base factory.
 
Method Summary
 String modifyWhitespace(String whitespace)
          Return the normalized form of the specified whitespace.
 
Methods inherited from class com.aliasi.tokenizer.ModifyTokenTokenizerFactory
modify, modifyToken
 
Methods inherited from class com.aliasi.tokenizer.ModifiedTokenizerFactory
baseTokenizerFactory, tokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WhitespaceNormTokenizerFactory

public WhitespaceNormTokenizerFactory(TokenizerFactory factory)
Construct a whitespace normalizing tokenizer factory from the specified base factory.

Parameters:
factory - Base tokenizer factory.
Method Detail

modifyWhitespace

public String modifyWhitespace(String whitespace)
Return the normalized form of the specified whitespace.

Overrides:
modifyWhitespace in class ModifyTokenTokenizerFactory
Parameters:
whitespace - Input whitespace.
Returns:
Normalized whitespace.