com.aliasi.corpus.parsers
Class BrownPosParser

java.lang.Object
  extended by com.aliasi.corpus.Parser<H>
      extended by com.aliasi.corpus.StringParser<TagHandler>
          extended by com.aliasi.corpus.parsers.BrownPosParser

Deprecated. This class will move to the demos in 4.0.

@Deprecated
public class BrownPosParser
extends StringParser<TagHandler>

The BrownPosParser class provides a parser for the NLTK distribution of the Brown Corpus. The data is formatted in pure ASCII, with sentences delimited, tokens delimited and tags separated from tokens by a forward slash. An example from the first file (brown/cp01) in the NLTK distribution is:

      The/at Fulton/np-tl County/nn-tl Grand/jj-tl Jury/nn-tl said/vbd Friday/nr an/at investigation/nn of/in Atlanta's/np$ recent/jj primary/nn election/nn produced/vbd ``/`` no/at evidence/nn ''/'' that/cs any/dti irregularities/nns took/vbd place/nn ./.  

      The/at jury/nn further/rbr said/vbd in/in term-end/nn presentments/nns that/cs the/at City/nn-tl Executive/jj-tl Committee/nn-tl ,/, which/wdt had/hvd over-all/jj charge/nn of/in the/at election/nn ,/, ``/`` deserves/vbz the/at praise/nn and/cc thanks/nns of/in the/at City/nn-tl of/in-tl Atlanta/np-tl ''/'' for/in the/at manner/nn in/in which/wdt the/at election/nn was/bedz conducted/vbn ./.  
 
 ...
 
Note that each sentence is on its own line, with a tab indentation. The sentences themselves are separated by multiple blank lines, which are simply ignored by this parser.

Each tag consists of a base tag and optional modifiers. This parser removes all of the modifiers. The modifiers include multiple tags separated by plus-signs (eg. EX+BEZ), multiple tags concatenated in the case of negation (eg. BEZ*), the prefix modifier FW- for foreign words (e.g. FW-JJ), the suffix modifier -NC for citations (e.g. NN-NC), the suffix -HL for words in headlines (e.g. NN-HL-TL in titles (e.g. NNS-TL).

The full set of base tags is given in the following table:

TagDescriptionExamples
'apostrophe
``double open quote
''double close quote
.sentence closer. ; ? !
(left paren 
)right paren 
*not, n't 
--dash 
,comma 
:colon 
ABLpre-qualifierquite, rather
ABNpre-quantifierhalf, all
ABXpre-quantifierboth
APpost-determinermany, several, next
AP$possessive post-determinermany, several, next
ATarticlea, the, no
BEbe 
BEDwere 
BEDZwas 
BEGbeing 
BEMam 
BENbeen 
BERare, art 
BEZis 
CCcoordinating conjunctionand, or
CDcardinal numeralone, two, 2, etc.
CD$possessive cardinal numeralone, two, 2, etc.
CSsubordinating conjunctionif, although
DOdo 
DODdid 
DOZdoes 
DTsingular determinerthis, that
DT$possessive singular determinerthis, that
DTIsingular or plural determiner/quantifiersome, any
DTSplural determinerthese, those
DTXdeterminer/double conjunctioneither
EXexistential there 
HVhave 
HVDhad (past tense) 
HVGhaving 
HVNhad (past participle) 
HVZhas 
INpreposition 
JJadjective 
JJ$possessive adjective 
JJRcomparative adjective 
JJSsemantically superlative adjective chief, top
JJTmorphologically superlative adjectivebiggest
MDmodal auxiliarycan, should, will
NILno category assigned 
NNsingular or mass noun 
NN$possessive singular noun 
NNSplural noun 
NNS$possessive plural noun 
NPproper noun or part of name phrase 
NP$possessive proper noun 
NPSplural proper noun 
NPS$possessive plural proper noun 
NRadverbial nounhome, today, west
NR$possessive adverbial noun
NRSplural adverbial noun 
ODordinal numeralfirst, 2nd
PNnominal pronouneverybody, nothing
PN$possessive nominal pronoun 
PP$possessive personal pronounmy, our
PP$$second (nominal) possessive pronounmine, ours
PPLsingular reflexive/intensive personal pronounmyself
PPLSplural reflexive/intensive personal pronounourselves
PPOobjective personal pronounme, him, it, them
PPS3rd. singular nominative pronounhe, she, it, one
PPSSother nominative personal pronounI, we, they, you
QLqualifiervery, fairly
QLPpost-qualifierenough, indeed
RBadverb 
RB$possessive adverb 
RBRcomparative adverb 
RBTsuperlative adverb 
RNnominal adverbhere then, indoors
RPadverb/particleabout, off, up
TOinfinitive marker to 
UHinterjection, exclamation 
VBverb, base form 
VBDverb, past tense 
VBGverb, present participle/gerund 
VBNverb, past participle 
VBZverb, 3rd. singular present 
WDTwh- determinerwhat, which
WP$possessive wh- pronounwhose
WPOobjective wh- pronounwhom, which, that
WPSnominative wh- pronounwho, which, that
WQLwh- qualifierhow
WRBwh- adverbhow, where, when

For information on NLTK and the Brown corpus, see:

Since:
LingPipe2.1
Version:
3.9.1
Author:
Bob Carpenter

Constructor Summary
BrownPosParser()
          Deprecated. Construct a Brown corpus part-of-speech tag parser with no handler specified.
BrownPosParser(TagHandler handler)
          Deprecated. Moving to demos in 4.0.
 
Method Summary
 String normalizeTag(String rawTag)
          Deprecated. Return a normalized form of the tag stripping off all modifiers and conjunctions.
 void parseString(char[] cs, int start, int end)
          Deprecated. Parse the specified input source and send extracted taggings to the current handler.
 TagHandler tagHandler()
          Deprecated. Moving to demos in 4.0.
 
Methods inherited from class com.aliasi.corpus.StringParser
parse
 
Methods inherited from class com.aliasi.corpus.Parser
getHandler, parse, parse, parseString, setHandler
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BrownPosParser

public BrownPosParser()
Deprecated. 
Construct a Brown corpus part-of-speech tag parser with no handler specified.


BrownPosParser

@Deprecated
public BrownPosParser(TagHandler handler)
Deprecated. Moving to demos in 4.0.

Construct a Brown corpus part-of-speech tag parser with the specified tag handler.

Parameters:
handler - Tag handler.
Method Detail

tagHandler

@Deprecated
public TagHandler tagHandler()
Deprecated. Moving to demos in 4.0.

Returns the tag handler for this parser.

Returns:
The tag handler for this parser.

parseString

public void parseString(char[] cs,
                        int start,
                        int end)
Deprecated. 
Parse the specified input source and send extracted taggings to the current handler. This string should correspond to the contents of an input file.

Specified by:
parseString in class Parser<TagHandler>
Parameters:
cs - Character array underlying string.
start - First character of string.
end - Index of one past the last character in the string.

normalizeTag

public String normalizeTag(String rawTag)
Deprecated. 
Return a normalized form of the tag stripping off all modifiers and conjunctions.

Parameters:
rawTag - Tag to normalize.
Returns:
Normalized form of tag.