com.aliasi.sentences
Class SentenceAnnotateFilter

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by com.aliasi.xml.SimpleElementHandler
          extended by com.aliasi.xml.SAXFilterHandler
              extended by com.aliasi.xml.ElementStackFilter
                  extended by com.aliasi.xml.TextContentFilter
                      extended by com.aliasi.sentences.SentenceAnnotateFilter
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler

Deprecated. LingPipe is no longer supportintg direct XML annotation; see the generic demos for examples.

@Deprecated
public class SentenceAnnotateFilter
extends TextContentFilter

A SentenceAnnotateFilter applies sentence-boundary annotation to the text content of the specified elements. An instance is constructed with a sentence model and a tokenizer factory. Optionally, an array of elements to annotate may be provided; if no array is specified, all text content is annotated.

The element sent is used to wrap sentences. If the filtered element contains only whitespace, it is not annotated. There will be no whitespace characters at the start or end of a sentence element's text content. All inter-sentence whitespace is retained, but included between sentence elements in the filtered element's content. For instance, the input <p> A b.  C d. </p> will yield <p> <sent>A b.</sent>  <sent>C d.</sent> </p>. Note that the text of a sentence element starts with the first character of the first token and ends with the last character of the last token. Inter-sentential whitespace winds up as text content outside of the sentence. In this case, there is a single whitespace before the first sentence, two spaces between the sentences, and a single space after the second sentence.

Since:
LingPipe1.0
Version:
3.9.1
Author:
Bob Carpenter

Field Summary
static String SENTENCE_ELEMENT
          Deprecated. Element used to group sentences in sentence annotation, namely "sent".
 
Fields inherited from class com.aliasi.xml.SAXFilterHandler
mHandler
 
Fields inherited from class com.aliasi.xml.SimpleElementHandler
CDATA_ATTS_TYPE, EMPTY_ATTS, NO_OP_DEFAULT_HANDLER
 
Constructor Summary
SentenceAnnotateFilter(SentenceModel sentenceModel, TokenizerFactory tokenizerFactory)
          Deprecated. Constructs a sentence annotation filter with the specified sentence model and tokenizer factory.
SentenceAnnotateFilter(SentenceModel sentenceModel, TokenizerFactory tokenizerFactory, String[] elements)
          Deprecated. Constructs a sentence annotation filter with the specified sentence model and tokenizer factory, and elements whose text content should be annotated.
 
Method Summary
 void characters(char[] cs, int start, int length)
          Deprecated. Annotates characters if all characters are being annotated, otherwise annotates if in an annotated element, otherwise passing characters directly to contained handler.
 void filteredCharacters(char[] cs, int start, int length)
          Deprecated. Performs sentence-boundary annotation of the specified characters.
 
Methods inherited from class com.aliasi.xml.TextContentFilter
filterElement
 
Methods inherited from class com.aliasi.xml.ElementStackFilter
currentAttributes, currentElement, endElement, getAttributesStack, getElementStack, noElement, startDocument, startElement
 
Methods inherited from class com.aliasi.xml.SAXFilterHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, setHandler, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class com.aliasi.xml.SimpleElementHandler
addSimpleAttribute, characters, characters, characters, characters, createAttributes, createAttributes, createAttributes, createAttributes, createAttributes, createAttributes, endSimpleElement, endSimpleElement, startEndSimpleElement, startEndSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SENTENCE_ELEMENT

public static final String SENTENCE_ELEMENT
Deprecated. 
Element used to group sentences in sentence annotation, namely "sent".

See Also:
Constant Field Values
Constructor Detail

SentenceAnnotateFilter

public SentenceAnnotateFilter(SentenceModel sentenceModel,
                              TokenizerFactory tokenizerFactory)
Deprecated. 
Constructs a sentence annotation filter with the specified sentence model and tokenizer factory.

Parameters:
sentenceModel - Sentence model to use for boundary detection.
tokenizerFactory - Factory to produce tokenizers for text.

SentenceAnnotateFilter

public SentenceAnnotateFilter(SentenceModel sentenceModel,
                              TokenizerFactory tokenizerFactory,
                              String[] elements)
Deprecated. 
Constructs a sentence annotation filter with the specified sentence model and tokenizer factory, and elements whose text content should be annotated.

Parameters:
sentenceModel - Sentence model to use for boundary detection.
tokenizerFactory - Factory to produce tokenizers for text.
elements - List of elements to be annotated.
Method Detail

characters

public void characters(char[] cs,
                       int start,
                       int length)
                throws SAXException
Deprecated. 
Annotates characters if all characters are being annotated, otherwise annotates if in an annotated element, otherwise passing characters directly to contained handler. All boundary events will be passed to the contained handler.

Specified by:
characters in interface ContentHandler
Overrides:
characters in class TextContentFilter
Parameters:
cs - Character array to filter.
start - First character to filter.
length - Number of characters to filter.
Throws:
SAXException - If there is an exception thrown by the contained handler.

filteredCharacters

public void filteredCharacters(char[] cs,
                               int start,
                               int length)
                        throws SAXException
Deprecated. 
Performs sentence-boundary annotation of the specified characters. Markup and text SAX events are delegated to the contained handler.

Specified by:
filteredCharacters in class TextContentFilter
Parameters:
cs - Character array to annotate.
start - First character to annotate.
length - Number of characters to annotate.
Throws:
SAXException - If there is an exception thrown by the contained handler.