com.aliasi.corpus.parsers
Class BrownTextParser

java.lang.Object
  extended by com.aliasi.corpus.Parser<H>
      extended by com.aliasi.corpus.StringParser<TextHandler>
          extended by com.aliasi.corpus.parsers.BrownTextParser

Deprecated. This class will move to the demos in 4.0.

@Deprecated
public class BrownTextParser
extends StringParser<TextHandler>

The BrownTextParser parses the Natural Language Toolkit (NLTK) distribution of the Brown Corpus. The results may be consumed by a text handler.

NLTK distributes the corpus as a set of files in zip format. This may be unzipped using the java.util.zip package and each entry's input stream converted to an input source to be provided tot his class.

Each file consists of lines of texts separated by zero or more empty lines. The lines of text are mostly sentences, but others are document titles, closings of personal letters, etc. The parser handles each line independently, separating each line by a pair of spaces as in the original Brown corpus. Line-initial tabs indicate paragraph breaks, and are retained as in the original corpus. Other inter-sentential whitespace is removed.

The text in each line consists of an optional initial tab followed by a sequence of token-tag pairs separated by single spaces. Each token-tag pair consists of a token followed by a single forward-slash character followed by the tag. Tokens are retained and a single whitespace is inserted between each token, except that the following tokens are never followed by spaces:

`` ` ( [ { $
and the following tokens are never preceded by spaces:
'' ' ] } , . ! ? : ; %

Since:
LingPipe2.0
Version:
3.9.1
Author:
Bob Carpenter

Constructor Summary
BrownTextParser()
          Deprecated. Construct a Brown text parser with a null text handler.
BrownTextParser(TextHandler handler)
          Deprecated. See class documentation.
 
Method Summary
 void parseString(char[] cs, int start, int end)
          Deprecated. Parse the specified input stream representing the NLTK distribution of the Brown corpus, passing characters to the specified handler.
 
Methods inherited from class com.aliasi.corpus.StringParser
parse
 
Methods inherited from class com.aliasi.corpus.Parser
getHandler, parse, parse, parseString, setHandler
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BrownTextParser

public BrownTextParser()
Deprecated. 
Construct a Brown text parser with a null text handler.


BrownTextParser

@Deprecated
public BrownTextParser(TextHandler handler)
Deprecated. See class documentation.

Construct a Brown text parser with the specified text handler.

Parameters:
handler - Handler to use for text found by this parser.
Method Detail

parseString

public void parseString(char[] cs,
                        int start,
                        int end)
                 throws IOException
Deprecated. 
Parse the specified input stream representing the NLTK distribution of the Brown corpus, passing characters to the specified handler.

Specified by:
parseString in class Parser<TextHandler>
Parameters:
cs - Underlying characters.
start - Index of first character.
end - Index of one past the last character.
Throws:
IOException - If there is an exception reading from the specified input stream.