|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.corpus.Parser<H>
com.aliasi.corpus.StringParser<TagHandler>
com.aliasi.corpus.parsers.BrownPosParser
public class BrownPosParser
The BrownPosParser class provides a parser for the
NLTK distribution of the Brown Corpus. The data is formatted in
pure ASCII, with sentences delimited, tokens delimited and tags
separated from tokens by a forward slash. An example from the
first file (brown/cp01) in the NLTK distribution is:
Note that each sentence is on its own line, with a tab indentation. The sentences themselves are separated by multiple blank lines, which are simply ignored by this parser.
The/at Fulton/np-tl County/nn-tl Grand/jj-tl Jury/nn-tl said/vbd Friday/nr an/at investigation/nn of/in Atlanta's/np$ recent/jj primary/nn election/nn produced/vbd ``/`` no/at evidence/nn ''/'' that/cs any/dti irregularities/nns took/vbd place/nn ./. The/at jury/nn further/rbr said/vbd in/in term-end/nn presentments/nns that/cs the/at City/nn-tl Executive/jj-tl Committee/nn-tl ,/, which/wdt had/hvd over-all/jj charge/nn of/in the/at election/nn ,/, ``/`` deserves/vbz the/at praise/nn and/cc thanks/nns of/in the/at City/nn-tl of/in-tl Atlanta/np-tl ''/'' for/in the/at manner/nn in/in which/wdt the/at election/nn was/bedz conducted/vbn ./. ...
Each tag consists of a base tag and optional modifiers. This
parser removes all of the modifiers. The modifiers include
multiple tags separated by plus-signs (eg. EX+BEZ),
multiple tags concatenated in the case of negation
(eg. BEZ*), the prefix modifier FW- for
foreign words (e.g. FW-JJ), the suffix modifier
-NC for citations (e.g. NN-NC), the
suffix -HL for words in headlines (e.g. NN-HL-TL in titles (e.g. NNS-TL).
The full set of base tags is given in the following table:
Tag Description Examples ' apostrophe `` double open quote '' double close quote . sentence closer . ; ? ! ( left paren ) right paren * not, n't -- dash , comma : colon ABL pre-qualifier quite, rather ABN pre-quantifier half, all ABX pre-quantifier both AP post-determiner many, several, next AP$ possessive post-determiner many, several, next AT article a, the, no BE be BED were BEDZ was BEG being BEM am BEN been BER are, art BEZ is CC coordinating conjunction and, or CD cardinal numeral one, two, 2, etc. CD$ possessive cardinal numeral one, two, 2, etc. CS subordinating conjunction if, although DO do DOD did DOZ does DT singular determiner this, that DT$ possessive singular determiner this, that DTI singular or plural determiner/quantifier some, any DTS plural determiner these, those DTX determiner/double conjunction either EX existential there HV have HVD had (past tense) HVG having HVN had (past participle) HVZ has IN preposition JJ adjective JJ$ possessive adjective JJR comparative adjective JJS semantically superlative adjective chief, top JJT morphologically superlative adjective biggest MD modal auxiliary can, should, will NIL no category assigned NN singular or mass noun NN$ possessive singular noun NNS plural noun NNS$ possessive plural noun NP proper noun or part of name phrase NP$ possessive proper noun NPS plural proper noun NPS$ possessive plural proper noun NR adverbial noun home, today, west NR$ possessive adverbial noun NRS plural adverbial noun OD ordinal numeral first, 2nd PN nominal pronoun everybody, nothing PN$ possessive nominal pronoun PP$ possessive personal pronoun my, our PP$$ second (nominal) possessive pronoun mine, ours PPL singular reflexive/intensive personal pronoun myself PPLS plural reflexive/intensive personal pronoun ourselves PPO objective personal pronoun me, him, it, them PPS 3rd. singular nominative pronoun he, she, it, one PPSS other nominative personal pronoun I, we, they, you QL qualifier very, fairly QLP post-qualifier enough, indeed RB adverb RB$ possessive adverb RBR comparative adverb RBT superlative adverb RN nominal adverb here then, indoors RP adverb/particle about, off, up TO infinitive marker to UH interjection, exclamation VB verb, base form VBD verb, past tense VBG verb, present participle/gerund VBN verb, past participle VBZ verb, 3rd. singular present WDT wh- determiner what, which WP$ possessive wh- pronoun whose WPO objective wh- pronoun whom, which, that WPS nominative wh- pronoun who, which, that WQL wh- qualifier how WRB wh- adverb how, where, when
For information on NLTK and the Brown corpus, see:
| Constructor Summary | |
|---|---|
BrownPosParser()
Construct a Brown corpus part-of-speech tag parser with no handler specified. |
|
BrownPosParser(TagHandler handler)
Construct a Brown corpus part-of-speech tag parser with the specified tag handler. |
|
| Method Summary | |
|---|---|
String |
normalizeTag(String rawTag)
Return a normalized form of the tag stripping off all modifiers and conjunctions. |
void |
parseString(char[] cs,
int start,
int end)
Parse the specified input source and send extracted taggings to the current handler. |
TagHandler |
tagHandler()
Returns the tag handler for this parser. |
| Methods inherited from class com.aliasi.corpus.StringParser |
|---|
parse |
| Methods inherited from class com.aliasi.corpus.Parser |
|---|
getHandler, parse, parse, parseString, setHandler |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public BrownPosParser()
public BrownPosParser(TagHandler handler)
handler - Tag handler.| Method Detail |
|---|
public TagHandler tagHandler()
public void parseString(char[] cs,
int start,
int end)
parseString in class Parser<TagHandler>cs - Character array underlying string.start - First character of string.end - Index of one past the last character in the string.public String normalizeTag(String rawTag)
rawTag - Tag to normalize.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||