|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.corpus.Parser<H>
com.aliasi.corpus.InputSourceParser<H>
com.aliasi.corpus.XMLParser
com.aliasi.corpus.parsers.GeniaSentenceParser
public class GeniaSentenceParser
A GeniaSentenceParser provides a chunk parser for the
XML version of the GENIA corpus. The type assigned to sentence
chunks is the constant SentenceChunker.SENTENCE_CHUNK_TYPE.
It only returns the sentences from citation abstracts, not
sentences in citation titles.
The following example is drawn from the initial part of the merged
3.02 version of the GENIA corpus (with some content ellided and replaced
by ellipses (..., but all spaces/linebreaks left as is):
All that is required is to pull all of the text content (including informative spaces) from the sentence elements.
<set> <article> <articleinfo> <bibliomisc>MEDLINE:95369245</bibliomisc> </articleinfo> <title> <sentence>...</sentence> </title> <abstract> <sentence><w c="NN">Activation</w> <w c="IN">of</w> <w c="DT">the</w> <cons lex="CD28_surface_receptor" sem="G#protein_family_or_group"><cons lex="CD28" sem="G#protein_molecule"><w c="NN">CD28</w></cons> <w c="NN">surface</w> <w c="NN">receptor</w></cons> <w c="VBZ">provides</w> <w c="DT">a</w> <w c="JJ">major</w> <w c="JJ">costimulatory</w> <w c="NN">signal</w> <w c="IN">for</w> <cons lex="T_cell_activation" sem="G#other_name"><w c="NN">T</w> <w c="NN">cell</w> <w c="NN">activation</w></cons> <w c="VBG">resulting</w> <w c="IN">in</w> <w c="VBN">enhanced</w> <w c="NN">production</w> <w c="IN">of</w> <cons lex="interleukin-2" sem="G#protein_molecule"><w c="NN">interleukin-2</w></cons> <w c="(">(</w><cons lex="IL-2" sem="G#protein_molecule"><w c="NN">IL-2</w></cons><w c=")">)</w> <w c="CC">and</w> <cons lex="cell_proliferation" sem="G#other_name"><w c="NN">cell</w> <w c="NN">proliferation</w></cons><w c=".">.</w></sentence> <sentence>...</sentence> ...
The GENIA corpus is available free of charge from:
| Field Summary | |
|---|---|
static String |
GENIA_ABSTRACT_ELT
The tag used for abstract elements in GENIA, namely abstract. |
static String |
GENIA_SENTENCE_ELT
The tag used for sentence elements in GENIA, namely sentence. |
| Constructor Summary | |
|---|---|
GeniaSentenceParser()
Construct a GENIA sentence chunk parser with no designated chunk handler. |
|
GeniaSentenceParser(ChunkHandler handler)
Construct a GENIA sentence chunk parser with the specified chunk handler. |
|
| Method Summary | |
|---|---|
ChunkHandler |
getChunkHandler()
Returns the chunk handler for this sentence parser. |
protected DefaultHandler |
getXMLHandler()
Returns the embedded XML handler. |
void |
setHandler(Handler handler)
Sets the handler to the specified chunk handler. |
| Methods inherited from class com.aliasi.corpus.XMLParser |
|---|
parse |
| Methods inherited from class com.aliasi.corpus.InputSourceParser |
|---|
parseString |
| Methods inherited from class com.aliasi.corpus.Parser |
|---|
getHandler, parse, parse, parseString |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String GENIA_SENTENCE_ELT
sentence.
public static final String GENIA_ABSTRACT_ELT
abstract.
| Constructor Detail |
|---|
public GeniaSentenceParser()
throws SAXException
setHandler(Handler).
SAXException - If there is an error configuring the
SAX XML reader required for parsing.
public GeniaSentenceParser(ChunkHandler handler)
throws SAXException
handler - The chunk handler used to process sentences
found by this parser.
SAXException - If there is an error configuring the
SAX XML reader required for parsing.| Method Detail |
|---|
protected DefaultHandler getXMLHandler()
XMLParser.
getXMLHandler in class XMLParserpublic void setHandler(Handler handler)
setHandler in class Parserhandler - New chunk handler.
IllegalArgumentException - If the handler is not a chunk
handler.public ChunkHandler getChunkHandler()
Parser.getHandler(), but the result in this case is cast to type
ChunkHandler.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||