com.aliasi.corpus.parsers
Class GeniaSentenceParser

java.lang.Object
  extended by com.aliasi.corpus.Parser<H>
      extended by com.aliasi.corpus.InputSourceParser<H>
          extended by com.aliasi.corpus.XMLParser<ObjectHandler<Chunking>>
              extended by com.aliasi.corpus.parsers.GeniaSentenceParser

Deprecated. This class will move to the demos in 4.0.

@Deprecated
public class GeniaSentenceParser
extends XMLParser<ObjectHandler<Chunking>>

A GeniaSentenceParser provides a chunk parser for the XML version of the GENIA corpus. The type assigned to sentence chunks is the constant SentenceChunker.SENTENCE_CHUNK_TYPE. It only returns the sentences from citation abstracts, not sentences in citation titles.

The following example is drawn from the initial part of the merged 3.02 version of the GENIA corpus (with some content ellided and replaced by ellipses (..., but all spaces/linebreaks left as is):

 <set>
 <article>
 <articleinfo>
 <bibliomisc>MEDLINE:95369245</bibliomisc>
 </articleinfo>
 <title>
 <sentence>...</sentence>
 </title>
 <abstract>
 <sentence><w c="NN">Activation</w> <w c="IN">of</w> <w c="DT">the</w> <cons lex="CD28_surface_receptor" sem="G#protein_family_or_group"><cons lex="CD28" sem="G#protein_molecule"><w c="NN">CD28</w></cons> <w c="NN">surface</w> <w c="NN">receptor</w></cons> <w c="VBZ">provides</w> <w c="DT">a</w> <w c="JJ">major</w> <w c="JJ">costimulatory</w> <w c="NN">signal</w> <w c="IN">for</w> <cons lex="T_cell_activation" sem="G#other_name"><w c="NN">T</w> <w c="NN">cell</w> <w c="NN">activation</w></cons> <w c="VBG">resulting</w> <w c="IN">in</w> <w c="VBN">enhanced</w> <w c="NN">production</w> <w c="IN">of</w> <cons lex="interleukin-2" sem="G#protein_molecule"><w c="NN">interleukin-2</w></cons> <w c="(">(</w><cons lex="IL-2" sem="G#protein_molecule"><w c="NN">IL-2</w></cons><w c=")">)</w> <w c="CC">and</w> <cons lex="cell_proliferation" sem="G#other_name"><w c="NN">cell</w> <w c="NN">proliferation</w></cons><w c=".">.</w></sentence>
 <sentence>...</sentence>
 ...
 
All that is required is to pull all of the text content (including informative spaces) from the sentence elements.

The GENIA corpus is available free of charge from:

Since:
LingPipe2.1.1
Version:
3.9.1
Author:
Bob Carpenter

Field Summary
static String GENIA_ABSTRACT_ELT
          Deprecated. The tag used for abstract elements in GENIA, namely abstract.
static String GENIA_SENTENCE_ELT
          Deprecated. The tag used for sentence elements in GENIA, namely sentence.
 
Constructor Summary
GeniaSentenceParser()
          Deprecated. Construct a GENIA sentence chunk parser with no designated chunk handler.
GeniaSentenceParser(ObjectHandler<Chunking> handler)
          Deprecated. Construct a GENIA sentence chunk parser with the specified chunk handler.
 
Method Summary
 ObjectHandler<Chunking> getChunkHandler()
          Deprecated. Use generic Parser.getHandler() instead.
protected  DefaultHandler getXMLHandler()
          Deprecated. Returns the embedded XML handler.
 
Methods inherited from class com.aliasi.corpus.XMLParser
parse
 
Methods inherited from class com.aliasi.corpus.InputSourceParser
parseString
 
Methods inherited from class com.aliasi.corpus.Parser
getHandler, parse, parse, parseString, setHandler
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GENIA_SENTENCE_ELT

public static final String GENIA_SENTENCE_ELT
Deprecated. 
The tag used for sentence elements in GENIA, namely sentence.

See Also:
Constant Field Values

GENIA_ABSTRACT_ELT

public static final String GENIA_ABSTRACT_ELT
Deprecated. 
The tag used for abstract elements in GENIA, namely abstract.

See Also:
Constant Field Values
Constructor Detail

GeniaSentenceParser

public GeniaSentenceParser()
                    throws SAXException
Deprecated. 
Construct a GENIA sentence chunk parser with no designated chunk handler. Chunk handlers may be later set using the method Parser.setHandler(Handler).

Throws:
SAXException - If there is an error configuring the SAX XML reader required for parsing.

GeniaSentenceParser

public GeniaSentenceParser(ObjectHandler<Chunking> handler)
                    throws SAXException
Deprecated. 
Construct a GENIA sentence chunk parser with the specified chunk handler.

Parameters:
handler - The chunk handler used to process sentences found by this parser.
Throws:
SAXException - If there is an error configuring the SAX XML reader required for parsing.
Method Detail

getXMLHandler

protected DefaultHandler getXMLHandler()
Deprecated. 
Returns the embedded XML handler. This method implements the required method for the abstract superclass XMLParser.

Specified by:
getXMLHandler in class XMLParser<ObjectHandler<Chunking>>
Returns:
The XML handler for this class.

getChunkHandler

@Deprecated
public ObjectHandler<Chunking> getChunkHandler()
Deprecated. Use generic Parser.getHandler() instead.

Returns the chunk handler for this sentence parser. The result will be the same as calling the superclass method Parser.getHandler(), but the result in this case is cast to type ChunkHandler.

Returns:
The chunk handler for this sentence parser.