|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.sentences.SentenceChunker
public class SentenceChunker
The SentenceChunker class uses a
SentenceModel to implement sentence detection through
the chunk.Chunker interface. A sentence chunker is
constructed from a tokenizer factory and a sentence model. The
tokenizer factory creates tokens that it sends to the sentence
model. The types of the chunks produced are given by the constant
SENTENCE_CHUNK_TYPE.
SentenceChunker constructed
from the deserialized tokenizer factory and sentence model.
| Field Summary | |
|---|---|
static String |
SENTENCE_CHUNK_TYPE
The type assigned to sentence chunks, namely "S". |
| Constructor Summary | |
|---|---|
SentenceChunker(TokenizerFactory tf,
SentenceModel sm)
Construct a sentence chunker from the specified tokenizer factory and sentence model. |
|
| Method Summary | |
|---|---|
Chunking |
chunk(char[] cs,
int start,
int end)
Return the chunking derived from the underlying sentence model over the tokenization of the specified character slice. |
Chunking |
chunk(CharSequence cSeq)
Return the chunking derived from the underlying sentence model over the tokenization of the specified character slice. |
SentenceModel |
sentenceModel()
Returns the sentence model for this chunker. |
TokenizerFactory |
tokenizerFactory()
Returns the tokenizer factory for this chunker. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String SENTENCE_CHUNK_TYPE
"S".
| Constructor Detail |
|---|
public SentenceChunker(TokenizerFactory tf,
SentenceModel sm)
tf - Tokenizer factory for chunker.sm - Sentence model for chunker.| Method Detail |
|---|
public TokenizerFactory tokenizerFactory()
public SentenceModel sentenceModel()
public Chunking chunk(CharSequence cSeq)
Warning: As described in the class documentation above, a tokenizer factory that produces tokenizers that do not reproduce the original sequence may cause the underlying character slice for the chunks to differ from the slice provided as an argument.
chunk in interface ChunkercSeq - Character sequence underlying the slice.
public Chunking chunk(char[] cs,
int start,
int end)
chunk(CharSequence) for more information.
chunk in interface Chunkercs - Underlying character sequence.start - Index of first character in slice.end - Index of one past the last character in the slice.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||