|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.corpus.Parser<H>
com.aliasi.corpus.StringParser
com.aliasi.corpus.parsers.GeniaPosParser
public class GeniaPosParser
The GeniaPosParser extracts the part-of-speech (POS)
tags from the GENIA text POS corpus and sends them to the specified
tag handler.
An example from the start of the GENIA POS corpus is:
The parser handles entries by "sentence", where a sentence is the set of token/tag pairs between the double-lines composed of equal signs (
UI/LS -/: 95369245/CD ==================== TI/LS -/: IL-2/NN gene/NN expression/NN and/CC NF-kappa/NN B/NN activation/NN through/IN CD28/NN requires/VBZ reactive/JJ oxygen/NN production/NN by/IN 5-lipoxygenase/NN ./. ==================== AB/LS -/: Activation/NN of/IN the/DT CD28/NN surface/NN receptor/NN provides/VBZ a/DT major/JJ costimulatory/JJ signal/NN for/IN T/NN cell/NN activation/NN resulting/VBG in/IN enhanced/VBN production/NN of/IN interleukin-2/NN (/( IL-2/NN )/) and/CC cell/NN proliferation/NN ./. ==================== In/IN primary/JJ T/NN lymphocytes/NNS ......snip.....
=). Some of these sentences
begin with a special token drawn from the following set:
UI: Begin Citation
TI: Citation Title
AB: Begin Abstract
LS and followed by a single
hyphen (-) tagged as part-of-speech colon
(:). Further note that the begin citation includes a
PubMed identifier drawn from the MEDLINE corpus (see the com.aliasi.medline package for more information on MEDLINE).
Further note that continuing sentences in the same abstract are not
tagged with any prefix.
The GENIA corpus itself and extensive information about it is available from:
| Constructor Summary | |
|---|---|
GeniaPosParser()
Construct a GENIA part-of-speech parser with no handler specified. |
|
GeniaPosParser(TagHandler handler)
Construct a GENIA part-of-speech parser with the specified tag handler. |
|
| Method Summary | |
|---|---|
TagHandler |
getTagHandler()
Returns the tag handler for this parser. |
void |
parseString(char[] cs,
int start,
int end)
Implementation of the parser for the GENIA corpus. |
| Methods inherited from class com.aliasi.corpus.StringParser |
|---|
parse |
| Methods inherited from class com.aliasi.corpus.Parser |
|---|
getHandler, parse, parse, parseString, setHandler |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public GeniaPosParser()
public GeniaPosParser(TagHandler handler)
handler - Tag handler for the parser.| Method Detail |
|---|
public TagHandler getTagHandler()
ClassCastException - If a handler that does not implement
TagHandler was set using Parser.setHandler(Handler).
public void parseString(char[] cs,
int start,
int end)
parseString in class Parsercs - Underlying characters.start - Index of first character in slice.end - Index of one past the last character in the slice.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||