|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.corpus.Corpus<H>
com.aliasi.corpus.DiskCorpus<H>
H - the type of handler to which this corpus sends eventspublic class DiskCorpus<H extends Handler>
A DiskCorpus reads data from a specified training and
test directory using a specified parser.
The disk corpus parses the data on-the-fly from disk rather than reading it into memory.
The directories holding training and test data are visited recursively. An GZIP files will be uncompressed and any Zip archives visited recursively.
| Field Summary | |
|---|---|
static String |
DEFAULT_TEST_DIR_NAME
The name of the default testing directory, "test". |
static String |
DEFAULT_TRAIN_DIR_NAME
The name of the default training directory, "train". |
| Constructor Summary | |
|---|---|
DiskCorpus(Parser<H> parser,
File dir)
Construct a corpus from the specified parser and data directory. |
|
DiskCorpus(Parser<H> parser,
File trainDir,
File testDir)
Construct a corpus from the specified parser and training and test directories. |
|
| Method Summary | |
|---|---|
String |
getCharEncoding()
Returns the current character encoding, or null
if none has been specified. |
String |
getSystemId()
Return the system identifier for this corpus or null
if none has been specified. |
Parser<H> |
parser()
Returns the data parser for this corpus. |
void |
setCharEncoding(String encoding)
Sets the character encoding for this corpus. |
void |
setSystemId(String systemId)
Sets the system identifier for the corpus. |
void |
visitTest(H handler)
Visit the testing data, sending extracted events to the specified handler. |
void |
visitTrain(H handler)
Visit the training data, sending extracted events to the specified handler. |
| Methods inherited from class com.aliasi.corpus.Corpus |
|---|
visitCorpus, visitCorpus |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String DEFAULT_TRAIN_DIR_NAME
"train".
public static final String DEFAULT_TEST_DIR_NAME
"test".
| Constructor Detail |
|---|
public DiskCorpus(Parser<H> parser,
File dir)
"train" (see DEFAULT_TRAIN_DIR_NAME). The testing data is read from
"test" (see DEFAULT_TEST_DIR_NAME). See DiskCorpus(Parser,File,File) for more information.
parser - Parser for the data.dir - Directory in which to find the data.
public DiskCorpus(Parser<H> parser,
File trainDir,
File testDir)
null, the corresponding visit method
will not produce any events.
parser - Parser for the data.trainDir - Directory of training data.testDir - Directory of testing data.| Method Detail |
|---|
public void setCharEncoding(String encoding)
encoding - Character encoding.public String getCharEncoding()
null
if none has been specified.
public void setSystemId(String systemId)
systemId - System identifier.public String getSystemId()
null
if none has been specified. See setSystemId(String) for
more information.
public Parser<H> parser()
public void visitTrain(H handler)
throws IOException
visitTrain in class Corpus<H extends Handler>handler - Handler to receive training events.
IOException - If there is an underlying I/O error.
public void visitTest(H handler)
throws IOException
visitTest in class Corpus<H extends Handler>handler - Handler to receive testing events.
IOException - If there is an underlying I/O error.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||