com.aliasi.sentences
Class SentenceChunker

java.lang.Object
  extended by com.aliasi.sentences.SentenceChunker
All Implemented Interfaces:
Chunker, Serializable

public class SentenceChunker
extends Object
implements Chunker, Serializable

The SentenceChunker class uses a SentenceModel to implement sentence detection through the chunk.Chunker interface. A sentence chunker is constructed from a tokenizer factory and a sentence model. The tokenizer factory creates tokens that it sends to the sentence model. The types of the chunks produced are given by the constant SENTENCE_CHUNK_TYPE.

Thread Safety

A sentence chunker is thread safe if its tokenizer factory and sentence model are thread safe. Typical LingPipe sentence models and tokenizer factories are thread safe for reads.

Serialization

A sentence chunker is serializer if both its tokenizer factory and sentence model are serializable. The deserialized object will be an instance of SentenceChunker constructed from the deserialized tokenizer factory and sentence model.

Since:
LingPipe2.1
Version:
3.9
Author:
Mitzi Morris, Bob Carpenter
See Also:
Serialized Form

Field Summary
static String SENTENCE_CHUNK_TYPE
          The type assigned to sentence chunks, namely "S".
 
Constructor Summary
SentenceChunker(TokenizerFactory tf, SentenceModel sm)
          Construct a sentence chunker from the specified tokenizer factory and sentence model.
 
Method Summary
 Chunking chunk(char[] cs, int start, int end)
          Return the chunking derived from the underlying sentence model over the tokenization of the specified character slice.
 Chunking chunk(CharSequence cSeq)
          Return the chunking derived from the underlying sentence model over the tokenization of the specified character slice.
 SentenceModel sentenceModel()
          Returns the sentence model for this chunker.
 TokenizerFactory tokenizerFactory()
          Returns the tokenizer factory for this chunker.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SENTENCE_CHUNK_TYPE

public static final String SENTENCE_CHUNK_TYPE
The type assigned to sentence chunks, namely "S".

See Also:
Constant Field Values
Constructor Detail

SentenceChunker

public SentenceChunker(TokenizerFactory tf,
                       SentenceModel sm)
Construct a sentence chunker from the specified tokenizer factory and sentence model.

Parameters:
tf - Tokenizer factory for chunker.
sm - Sentence model for chunker.
Method Detail

tokenizerFactory

public TokenizerFactory tokenizerFactory()
Returns the tokenizer factory for this chunker.

Returns:
The tokenizer factory for this chunker.

sentenceModel

public SentenceModel sentenceModel()
Returns the sentence model for this chunker.

Returns:
The sentence model for this chunker.

chunk

public Chunking chunk(CharSequence cSeq)
Return the chunking derived from the underlying sentence model over the tokenization of the specified character slice. Iterating over the returned set is guaranteed to return the sentence chunks in their original textual order.

Warning: As described in the class documentation above, a tokenizer factory that produces tokenizers that do not reproduce the original sequence may cause the underlying character slice for the chunks to differ from the slice provided as an argument.

Specified by:
chunk in interface Chunker
Parameters:
cSeq - Character sequence underlying the slice.
Returns:
The sentence chunking of the specified character sequence.

chunk

public Chunking chunk(char[] cs,
                      int start,
                      int end)
Return the chunking derived from the underlying sentence model over the tokenization of the specified character slice. See chunk(CharSequence) for more information.

Specified by:
chunk in interface Chunker
Parameters:
cs - Underlying character sequence.
start - Index of first character in slice.
end - Index of one past the last character in the slice.
Returns:
The sentence chunking of the specified character slice.