|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.corpus.Parser<H>
com.aliasi.corpus.StringParser<TextHandler>
com.aliasi.corpus.parsers.BrownTextParser
@Deprecated public class BrownTextParser
The BrownTextParser parses the Natural Language Toolkit
(NLTK) distribution of the Brown
Corpus. The results may be consumed by a text
handler.
NLTK distributes the corpus as a set of files in zip format.
This may be unzipped using the java.util.zip package and
each entry's input stream converted to an input source to
be provided tot his class.
Each file consists of lines of texts separated by zero or more empty lines. The lines of text are mostly sentences, but others are document titles, closings of personal letters, etc. The parser handles each line independently, separating each line by a pair of spaces as in the original Brown corpus. Line-initial tabs indicate paragraph breaks, and are retained as in the original corpus. Other inter-sentential whitespace is removed.
The text in each line consists of an optional initial tab followed by a sequence of token-tag pairs separated by single spaces. Each token-tag pair consists of a token followed by a single forward-slash character followed by the tag. Tokens are retained and a single whitespace is inserted between each token, except that the following tokens are never followed by spaces:
and the following tokens are never preceded by spaces:
`` ` ( [ { $
'' ' ] } , . ! ? : ; %
| Constructor Summary | |
|---|---|
BrownTextParser()
Deprecated. Construct a Brown text parser with a null text handler. |
|
BrownTextParser(TextHandler handler)
Deprecated. See class documentation. |
|
| Method Summary | |
|---|---|
void |
parseString(char[] cs,
int start,
int end)
Deprecated. Parse the specified input stream representing the NLTK distribution of the Brown corpus, passing characters to the specified handler. |
| Methods inherited from class com.aliasi.corpus.StringParser |
|---|
parse |
| Methods inherited from class com.aliasi.corpus.Parser |
|---|
getHandler, parse, parse, parseString, setHandler |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public BrownTextParser()
@Deprecated public BrownTextParser(TextHandler handler)
handler - Handler to use for text found by this parser.| Method Detail |
|---|
public void parseString(char[] cs,
int start,
int end)
throws IOException
parseString in class Parser<TextHandler>cs - Underlying characters.start - Index of first character.end - Index of one past the last character.
IOException - If there is an exception reading from the
specified input stream.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||