|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.corpus.Parser<H>
com.aliasi.corpus.StringParser<TagHandler>
com.aliasi.corpus.parsers.AbstractMedTagParser
public abstract class AbstractMedTagParser
The AbstractMedTagParser class provides an adapter for
NCBI's MedTag corpora, including GeneTag and MedPost. The MedTag
format is sentence based, consisting of a number of pairs of lines
of the following form:
The initial part of the first line,
P00073344A0367 tok_tag tok_tag ... tok_tag P00083846T0000 tok_tag tok_tag ... tok_tag ...
P00073344,
provides the PubMed identifier from which the text was abstracted.
The second part of the first line, A0367 indicates
that the sentence was from the abstract, beginning at character
offset 367. The text may be extracted from titles or abstracts;
the third line indicates a line beginning with the first character
(index 0000) of the title (T) of the citation with PubMed ID 83846.
The second (and fourth) line consist of a sequence of tokens and tags, separated by an underscore. The tags are part-of-speech tags in the MedPost corpus and chunk entity tags in the GeneTag corpus. Note that with this format, whitespace information is lost.
Subclasses
must override the parseTokensTags(String[],String[],String[]) method to actually do
the parsing of a sentence once its tags are extracted.
For more information on the MedTag project, see:
| Constructor Summary | |
|---|---|
AbstractMedTagParser()
Construct an abstract MedTag parser with no handler specified. |
|
AbstractMedTagParser(TagHandler handler)
Construct an abstract MedTag parser with the specified tag handler. |
|
| Method Summary | |
|---|---|
void |
parseString(char[] cs,
int start,
int end)
Parse the specified input source and send extracted taggings to the current handler. |
protected abstract void |
parseTokensTags(String[] tokens,
String[] whitespaces,
String[] tags)
This method handles the raw tokens and tags pulled from a MedTag corpus. |
TagHandler |
tagHandler()
Deprecated. Use generic Parser.getHandler() instead. |
| Methods inherited from class com.aliasi.corpus.StringParser |
|---|
parse |
| Methods inherited from class com.aliasi.corpus.Parser |
|---|
getHandler, parse, parse, parseString, setHandler |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public AbstractMedTagParser()
public AbstractMedTagParser(TagHandler handler)
handler - Tag handler.| Method Detail |
|---|
@Deprecated public TagHandler tagHandler()
Parser.getHandler() instead.
public void parseString(char[] cs,
int start,
int end)
parseString in class Parser<TagHandler>cs - Character array underlying string.start - First character of string.end - Index of one past the last character in the string.
protected abstract void parseTokensTags(String[] tokens,
String[] whitespaces,
String[] tags)
tokens - Raw tokens to handle.whitespaces - Raw whitespaces to handle.tags - Raw tags to handle.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||