com.aliasi.xml
Class TextContentFilter

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by com.aliasi.xml.SimpleElementHandler
          extended by com.aliasi.xml.SAXFilterHandler
              extended by com.aliasi.xml.ElementStackFilter
                  extended by com.aliasi.xml.TextContentFilter
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler
Direct Known Subclasses:
SentenceAnnotateFilter

public abstract class TextContentFilter
extends ElementStackFilter

A filter that applies an operation to text content in specified elements. Elements for which text should be filtered are specified using filterElement(String). The operation to apply to character content for these elements is defined by an implementation in a subclass of filteredCharacters(char[],int,int). Unfiltered characters and all other SAX events are delegated to the contained handler. To process text in contiguous chunks, wrap instances of TextContentFilter as contained handler in an instance of GroupCharactersFilter.

The elements that are filtered are specified by qualified element name. If there are no namespace qualifications, the name will be unqualified. For instance, the following document contains two bar elements with the same URI:

   <foo>
     <a:bar xmlns:a="http://one">xyz</a:bar>
     <b:bar xmlns:b="http://one">uvw</b:bar>
   </foo>
 
In order to filter the content of element bar, both prefixes need to be specified: a:bar and b:bar and other equivalent versions will not be recognized. There is no way to properly handle a document such as the following, where the two qualified element names are the same, but the elements are different:
   <foo>
     <a:bar xmlns:a="http://one">xyz</a:bar>
     <a:bar xmlns:a="http://two">uvw</a:bar>
   </foo>
 
For this document, specifying a:bar will pick up the bar element from both the http://one and http://two namespaces.

Because this filter requires qualified names, the XML parser must set the following SAX2 feature to true:

http://xml.org/sax/features/namespace-prefixes
See the SAX2 Feature Specification for information on this and other features.

Since:
LingPipe1.0
Version:
3.8
Author:
Bob Carpenter

Field Summary
 
Fields inherited from class com.aliasi.xml.SAXFilterHandler
mHandler
 
Fields inherited from class com.aliasi.xml.SimpleElementHandler
CDATA_ATTS_TYPE, EMPTY_ATTS, NO_OP_DEFAULT_HANDLER
 
Constructor Summary
TextContentFilter()
          Construct a text content filter without a specified contained handler.
TextContentFilter(DefaultHandler handler)
          Construct a text content filter which passes events to the specified handler.
 
Method Summary
 void characters(char[] cs, int start, int length)
          Handle character content, delegating unfiltered characters to the contained handler, and delegating filtered characters to filteredCharacters(char[],int,int).
abstract  void filteredCharacters(char[] cs, int start, int length)
          Handle filtered character content.
 void filterElement(String qName)
          Filter the text content of elements with the specified qualified name.
 
Methods inherited from class com.aliasi.xml.ElementStackFilter
currentAttributes, currentElement, endElement, getAttributesStack, getElementStack, noElement, startDocument, startElement
 
Methods inherited from class com.aliasi.xml.SAXFilterHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, setHandler, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class com.aliasi.xml.SimpleElementHandler
addSimpleAttribute, characters, characters, characters, characters, createAttributes, createAttributes, createAttributes, createAttributes, createAttributes, createAttributes, endSimpleElement, endSimpleElement, startEndSimpleElement, startEndSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement, startSimpleElement
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextContentFilter

public TextContentFilter(DefaultHandler handler)
Construct a text content filter which passes events to the specified handler.

Parameters:
handler - Contained handler to which events are passed.

TextContentFilter

public TextContentFilter()
Construct a text content filter without a specified contained handler. Set the contained handler using SAXFilterHandler.setHandler(DefaultHandler).

Method Detail

filterElement

public void filterElement(String qName)
Filter the text content of elements with the specified qualified name.

Parameters:
qName - Qualified name of elements to filter.

characters

public void characters(char[] cs,
                       int start,
                       int length)
                throws SAXException
Handle character content, delegating unfiltered characters to the contained handler, and delegating filtered characters to filteredCharacters(char[],int,int).

Specified by:
characters in interface ContentHandler
Overrides:
characters in class SAXFilterHandler
Parameters:
cs - Array of characters to filter.
start - First character to filter.
length - Number of characters to filter.
Throws:
SAXException - If there is an exception from the contained handler or from the filtered characters method.

filteredCharacters

public abstract void filteredCharacters(char[] cs,
                                        int start,
                                        int length)
                                 throws SAXException
Handle filtered character content. This method will be called on all text content of elements specified using filterElement(String). It is important that this method not invoke characters(char[],int,int), either directly through a super call from a subclass; instead, access the embedded handler SAXFilterHandler.mHandler directly.

Parameters:
cs - Array of characters to filter.
start - First character to filter.
length - Number of characters to filter.
Throws:
SAXException - If there is an exception handling the characters.