Class LineTokenizerFactory

  extended by com.aliasi.tokenizer.RegExTokenizerFactory
      extended by com.aliasi.tokenizer.LineTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Compilable, Serializable

public class LineTokenizerFactory
extends RegExTokenizerFactory

A LineTokenizerFactory treats each line of an input as a token. Whitespaces separating lines are simply newlines. This is useful for decoders that work at the line level.

Line terminators are as defined in Pattern, and include all of the Windows, Unix, and Macintosh standards, as well as some unicode extensions.

Whitespaces will be either empty strings or strings representing one or more newlines.

Tokens may consist entirely of whitespace characters if whitespace is the only thing on a line. But tokens will never contain sequences representing newlines. Tokens will alwyas consist of at least one character.


Input StringTokensWhitespaces
""{}{ "" }
"abc"{ "abc" }{ "", "" }
"abc\ndef"{ "abc", "def" }{ "", "\n", "" }
"abc\r\ndef"{ "abc", "def" }{ "", "\r\n", "" }
"abc\r\ndef"{ "abc", "def" }{ "", "\r\n", "" }
" abc\n def \n"{ " abc", " def " }{ "", "\n", "\n" }
" \n"{ " " }{ "", "\n" }

Thread Safety

Line tokenizer factories are completely thread safe.


A line tokenizer factory may be serialized. Upon deserialization, the resulting class will be the singleton item INSTANCE.

Implementation Note

This tokenizer factory is nothing more than a convenience wrapper around a very simple RegExTokenizerFactory, with the simplest possible regular expression:


Because the regular expression tokenizer factory takes the default regular expression flags (see Pattern), the period (.) matches any character except a newline.

Bob Carpenter
See Also:
Serialized Form

Field Summary
static LineTokenizerFactory INSTANCE
          A reusable instance of this class.
Constructor Summary
          Deprecated. Use singleton instance INSTANCE instead.
Method Summary
 String toString()
          Returns a string representation of this factory, consisting of its name.
Methods inherited from class com.aliasi.tokenizer.RegExTokenizerFactory
compileTo, pattern, tokenizer
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail


public static final LineTokenizerFactory INSTANCE
A reusable instance of this class. Because line tokenizer factories are thread safe, this instance may be used everywhere.

Constructor Detail


public LineTokenizerFactory()
Deprecated. Use singleton instance INSTANCE instead.

Construct a line-based tokenizer. See the class documentation above for a description of behavior.

Method Detail


public String toString()
Returns a string representation of this factory, consisting of its name.

toString in class RegExTokenizerFactory
The name of this class.