com.aliasi.sentences
Interface SentenceModel

All Known Implementing Classes:
AbstractSentenceModel, HeuristicSentenceModel, IndoEuropeanSentenceModel, MedlineSentenceModel

public interface SentenceModel

The SentenceModel interface specifies a means of doing sentence segmentation from arrays of tokens and whitespaces.

The sentence model operates over aligned arrays of tokens and whitespaces, as derived from a Tokenizer. There are two methods in the interface. The standard external interface is boundaryIndices(String[],String[]), which returns an array of token indices that are sentence-final. For instance, with tokens {"John", "ran", ".", "He", "also", "jumped", "!"}, and whitespaces {"", " ", "", "  ", " ", " ", " ", "", ""}. the return result from the Indo-European model would be {2,6}, because the token indexed 2 is a period (.) and the token indexed 6 is an exclamation point (!). The return result will often depend on the whitespaces as well as the tokens.

The second method is boundaryIndices(String[],String[],int,int,Collection), which adds the boundary indexes as Integers to the specified collection for the slice determined by the start and end plus one indices.

Since:
LingPipe1.0
Version:
3.0
Author:
Bob Carpenter

Method Summary
 int[] boundaryIndices(String[] tokens, String[] whitespaces)
          Returns an array of indices of sentence-final tokens.
 void boundaryIndices(String[] tokens, String[] whitespaces, int start, int end, Collection<Integer> indices)
          Adds the sentence final token indices as Integer instances to the specified collection, only considering tokens between index start and end-1 inclusive.
 

Method Detail

boundaryIndices

int[] boundaryIndices(String[] tokens,
                      String[] whitespaces)
Returns an array of indices of sentence-final tokens.

Parameters:
tokens - Array of tokens to annotate.
whitespaces - Array of whitespaces to annotate.
Returns:
Array of integers indicating indices of tokens that are sentence final.
Throws:
IllegalArgumentException - If the array of whitespaces is not one longer than the array of tokens.

boundaryIndices

void boundaryIndices(String[] tokens,
                     String[] whitespaces,
                     int start,
                     int end,
                     Collection<Integer> indices)
Adds the sentence final token indices as Integer instances to the specified collection, only considering tokens between index start and end-1 inclusive.

Parameters:
tokens - Array of tokens to annotate.
whitespaces - Array of whitespaces to annotate.
start - Index of first token to annotate.
end - Index one beyond the last token to annotate.
indices - Collection into which to write the boundary indices.
Throws:
IllegalArgumentException - If the array of tokens is not at least as long as start+end and the array of whitespaces at least as long as start+end+1.