com.aliasi.corpus
Class DiskCorpus<H extends Handler>

java.lang.Object
  extended by com.aliasi.corpus.Corpus<H>
      extended by com.aliasi.corpus.DiskCorpus<H>
Type Parameters:
H - the type of handler to which this corpus sends events

public class DiskCorpus<H extends Handler>
extends Corpus<H>

A DiskCorpus reads data from a specified training and test directory using a specified parser.

Since:
LingPipe2.3
Version:
3.8.1
Author:
Bob Carpenter, Mike Ross

Field Summary
static String DEFAULT_TEST_DIR_NAME
          The name of the default testing directory, "test".
static String DEFAULT_TRAIN_DIR_NAME
          The name of the default training directory, "train".
 
Constructor Summary
DiskCorpus(Parser<H> parser, File dir)
          Construct a corpus from the specified parser and data directory.
DiskCorpus(Parser<H> parser, File trainDir, File testDir)
          Construct a corpus from the specified parser and training and test directories.
 
Method Summary
 String getCharEncoding()
          Returns the current character encoding, or null if none has been specified.
 String getSystemId()
          Return the system identifier for this corpus or null if none has been specified.
 Parser<H> parser()
          Returns the data parser for this corpus.
 void setCharEncoding(String encoding)
          Sets the character encoding for this corpus.
 void setSystemId(String systemId)
          Sets the system identifier for the corpus.
 void visitTest(H handler)
          Visit the testing data, sending extracted events to the specified handler.
 void visitTrain(H handler)
          Visit the training data, sending extracted events to the specified handler.
 
Methods inherited from class com.aliasi.corpus.Corpus
visitCorpus, visitCorpus
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_TRAIN_DIR_NAME

public static final String DEFAULT_TRAIN_DIR_NAME
The name of the default training directory, "train".

See Also:
Constant Field Values

DEFAULT_TEST_DIR_NAME

public static final String DEFAULT_TEST_DIR_NAME
The name of the default testing directory, "test".

See Also:
Constant Field Values
Constructor Detail

DiskCorpus

public DiskCorpus(Parser<H> parser,
                  File dir)
Construct a corpus from the specified parser and data directory. The training data will be read from the subdirectory of the specified directory named "train" (see DEFAULT_TRAIN_DIR_NAME). The testing data is read from "test" (see DEFAULT_TEST_DIR_NAME). See DiskCorpus(Parser,File,File) for more information.

Parameters:
parser - Parser for the data.
dir - Directory in which to find the data.

DiskCorpus

public DiskCorpus(Parser<H> parser,
                  File trainDir,
                  File testDir)
Construct a corpus from the specified parser and training and test directories. If either directory is null, the corresponding visit method will not produce any events.

Parameters:
parser - Parser for the data.
trainDir - Directory of training data.
testDir - Directory of testing data.
Method Detail

setCharEncoding

public void setCharEncoding(String encoding)
Sets the character encoding for this corpus. If there is no character encoding set, the parser will determine the default character encoding.

Parameters:
encoding - Character encoding.

getCharEncoding

public String getCharEncoding()
Returns the current character encoding, or null if none has been specified.

Returns:
The current character encoding.

setSystemId

public void setSystemId(String systemId)
Sets the system identifier for the corpus. This will be provided to the input sources used for parsing, which use the system identifier to resolve relative URLs (e.g., for resolving DTD references in XML documents). The system identifier will default to the name of the file being processed or the containing zip/gzip file.

Parameters:
systemId - System identifier.

getSystemId

public String getSystemId()
Return the system identifier for this corpus or null if none has been specified. See setSystemId(String) for more information.

Returns:
The specified system identifier.

parser

public Parser<H> parser()
Returns the data parser for this corpus.

Returns:
The data parser for this corpus.

visitTrain

public void visitTrain(H handler)
                throws IOException
Visit the training data, sending extracted events to the specified handler. This method walks over the entire training directory and the files within any GZip or Zip compressed files.

Overrides:
visitTrain in class Corpus<H extends Handler>
Parameters:
handler - Handler to receive training events.
Throws:
IOException - If there is an underlying I/O error.

visitTest

public void visitTest(H handler)
               throws IOException
Visit the testing data, sending extracted events to the specified handler. This method walks over the entire test directory and the files within any GZip or Zip compressed files.

Overrides:
visitTest in class Corpus<H extends Handler>
Parameters:
handler - Handler to receive testing events.
Throws:
IOException - If there is an underlying I/O error.