com.aliasi.tokenizer
Class TokenChunker

java.lang.Object
  extended by com.aliasi.tokenizer.TokenChunker
All Implemented Interfaces:
Chunker, Serializable

public class TokenChunker
extends Object
implements Chunker, Serializable

A TokenChunker provides an implementationg of the Chunker interface based on an underlying tokenizer factory.

The chunkings produced will have one chunk per token produced by the underlying tokenizer factory, with start and end positions as determined by the tokenizer's start and end position methods. The type of the chunk will be the actual string yield of the token, which in the case of modifying tokenizers like stemmers, will not necessarily be the same as the underlying text span.

Serialization

The token chunker will be serializable if the underlying tokenizer factory is serializable. If it is not, serialization will throw an java.io.NotSerializableException. The object read back in will be an instance of TokenChunker constructed with the reconstituted tokenizer factory.

Since:
Lingpipe3.9
Version:
3.8.1
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
TokenChunker(TokenizerFactory tokenizerFactory)
          Construct a chunker from the specified tokenizer factory.
 
Method Summary
 Chunking chunk(char[] cs, int start, int end)
          Return the chunking produced by tokenizing the specified character array slice.
 Chunking chunk(CharSequence cSeq)
          Return the chunking produced by tokenizing the specified character sequence.
 TokenizerFactory tokenizerFactory()
          Return the tokenizer factory for this token chunker.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenChunker

public TokenChunker(TokenizerFactory tokenizerFactory)
Construct a chunker from the specified tokenizer factory.

Parameters:
tokenizerFactory - Tokenizer factory for this chunker.
Method Detail

tokenizerFactory

public TokenizerFactory tokenizerFactory()
Return the tokenizer factory for this token chunker.

Returns:
The tokenizer factory for this chunker.

chunk

public Chunking chunk(CharSequence cSeq)
Return the chunking produced by tokenizing the specified character sequence.

Specified by:
chunk in interface Chunker
Parameters:
cSeq - Character sequence to chunk.
Returns:
The chunking corresponding to tokens produced by the tokenizer factory.

chunk

public Chunking chunk(char[] cs,
                      int start,
                      int end)
Return the chunking produced by tokenizing the specified character array slice.

Specified by:
chunk in interface Chunker
Parameters:
cs - Underlying characters for slice.
start - Index of first character in slice.
end - Index of one past the last character in the slice.
Returns:
The chunking corresponding to tokens produced by the tokenizer factory.