|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface SentenceModel
The SentenceModel interface specifies a means of doing
sentence segmentation from arrays of tokens and whitespaces.
The sentence model operates over aligned arrays of tokens and
whitespaces, as derived from a Tokenizer. There are two methods in the
interface. The standard external interface is boundaryIndices(String[],String[]), which returns an array of
token indices that are sentence-final. For instance, with tokens
{"John", "ran", ".", "He", "also", "jumped", "!"}, and
whitespaces {"", " ", "", " ", " ", " ", " ", "", ""}.
the return result from the Indo-European model would be
{2,6}, because the token indexed 2 is a period
(.) and the token indexed 6 is an exclamation point
(!). The return result will often depend on the
whitespaces as well as the tokens.
The second method is boundaryIndices(String[],String[],int,int,Collection), which adds
the boundary indexes as Integers to the specified
collection for the slice determined by the start and end plus one
indices.
| Method Summary | |
|---|---|
int[] |
boundaryIndices(String[] tokens,
String[] whitespaces)
Returns an array of indices of sentence-final tokens. |
void |
boundaryIndices(String[] tokens,
String[] whitespaces,
int start,
int end,
Collection<Integer> indices)
Adds the sentence final token indices as Integer
instances to the specified collection, only considering tokens
between index start and end-1
inclusive. |
| Method Detail |
|---|
int[] boundaryIndices(String[] tokens,
String[] whitespaces)
tokens - Array of tokens to annotate.whitespaces - Array of whitespaces to annotate.
IllegalArgumentException - If the array of whitespaces is
not one longer than the array of tokens.
void boundaryIndices(String[] tokens,
String[] whitespaces,
int start,
int end,
Collection<Integer> indices)
Integer
instances to the specified collection, only considering tokens
between index start and end-1
inclusive.
tokens - Array of tokens to annotate.whitespaces - Array of whitespaces to annotate.start - Index of first token to annotate.end - Index one beyond the last token to annotate.indices - Collection into which to write the boundary
indices.
IllegalArgumentException - If the array of tokens is
not at least as long as start+end and the
array of whitespaces at least as long as start+end+1.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||