com.aliasi.classify
Class XValidatingClassificationCorpus<E>

java.lang.Object
  extended by com.aliasi.corpus.Corpus<ClassificationHandler<E,Classification>>
      extended by com.aliasi.classify.XValidatingClassificationCorpus<E>
Type Parameters:
E - the type of objects being classified
All Implemented Interfaces:
ClassificationHandler<E,Classification>, Handler

Deprecated. Use XValidatingObjectCorpus with type com.aliasi.corpus.ObjectHandler<Classified<E>> instead.

@Deprecated
public class XValidatingClassificationCorpus<E>
extends Corpus<ClassificationHandler<E,Classification>>
implements ClassificationHandler<E,Classification>

A XValidatingClassificationCorpus holds a set of inputs and classification results to be used as a corpus with built-in cross-validation support. Instances may be added in the constructors, or through the implementation of a classification handler.

Handler Implementation

When used as a handler, this class simply collects the examples and stores them in internal arrays. This allows an instance of this class to be used like any other classification handler, the result of which is simply a collection of instances with their classification results.

Cross Validation

Cross-validation divides a corpus up into roughly equal sized parts, called folds, assigning one of the parts as the test section and the other parts as training sections. A typical number of folds is 10, with 90% of the data being used for training and 10% for testing.

Initially, the fold will be set to 0, which takes the initial prefix of the data for testing and the rest for training. The fold may be reset using setFold(int). This will reset the fold to be the specified value. In this way, by iterating from 0 to numFolds()-1, a full cross-validation may be performed.

The randomization method permuteCorpus(Random) takes a corpus and permutes its instances. This may be used to make each fold random.

Use Without Cross Validation

No matter how the folds are set, using Corpus.visitCorpus(Handler) will run the specified handler over all of the data collected in this corpus.

Concurrency

This class must be used with external read/write synchronization. The write operations include the constructor, set-fold, permute corpus and handle methods. The read operations include the visit num instances and fold reporting methods.

Since:
LingPipe3.5
Version:
3.9.1
Author:
Bob Carpenter

Constructor Summary
XValidatingClassificationCorpus(int numFolds)
          Deprecated. Construct a cross-validating corpus with the specified number of folds that initially contains no examples.
XValidatingClassificationCorpus(List<E> inputList, List<Classification> classificationList, int numFolds)
          Deprecated. Construct a cross-validating corpus containing the instances specified on the parallel arrays of inputs and classifications, and the specified number of folds.
XValidatingClassificationCorpus(Parser<ClassificationHandler<E,Classification>> parser, File[] dataFiles, int numFolds)
          Deprecated. See class documentation
XValidatingClassificationCorpus(XValidatingClassificationCorpus<E> corpus)
          Deprecated. Construct a deep copy of the specified corpus.
 
Method Summary
 String[] categories()
          Deprecated. Returns the categories found in the cases for this corpus sorted into ascending order.
 int fold()
          Deprecated. Returns the current fold.
 void handle(E e, Classification c)
          Deprecated. Adds the specified object and corresponding classification to the corpus.
 int numFolds()
          Deprecated. Returns the number of folds for this corpus.
 int numInstances()
          Deprecated. Returns the number of instances for this corpus.
 void permuteCorpus(Random random)
          Deprecated. Randomly permutes the corpus using the specified randomizer.
 void setFold(int fold)
          Deprecated. Set the current fold to the specified value.
 String toString()
          Deprecated. Returns a string representation of the size of this corpus.
 void visitTest(ClassificationHandler<E,Classification> handler)
          Deprecated. See clas documentation
 void visitTrain(ClassificationHandler<E,Classification> handler)
          Deprecated. See clas documentation
 
Methods inherited from class com.aliasi.corpus.Corpus
visitCorpus, visitCorpus
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

XValidatingClassificationCorpus

public XValidatingClassificationCorpus(XValidatingClassificationCorpus<E> corpus)
Deprecated. 
Construct a deep copy of the specified corpus. The deep copy may be permuted or even added to with the handle method independently of the corpus from which it was copied.

The main use for this method is for cross-validation, where several copies of the same corpus may be used in parallel. Typically, a single corpus is permuted once and then copied with the copies being set to handle different folds concurrently.

The cost of the copy is the pair of parallel lists to hold the inputs and classifications. The inputs and classifications are not themselves deep-copied.

Parameters:
corpus - Corpus to deep copy.

XValidatingClassificationCorpus

public XValidatingClassificationCorpus(List<E> inputList,
                                       List<Classification> classificationList,
                                       int numFolds)
Deprecated. 
Construct a cross-validating corpus containing the instances specified on the parallel arrays of inputs and classifications, and the specified number of folds.

The lists are copied and not used after construction.

Parameters:
inputList - List of inputs to classify.
classificationList - List of classification results for inputs.
numFolds - Number of folds for cross-validation.
Throws:
IllegalArgumentException - If the number of folds is not greater than zero or if the parallel lists are not of the same length.

XValidatingClassificationCorpus

@Deprecated
public XValidatingClassificationCorpus(Parser<ClassificationHandler<E,Classification>> parser,
                                                  File[] dataFiles,
                                                  int numFolds)
                                throws IOException
Deprecated. See class documentation

Construct a cross-validating corpus containing the instances parsed out of the specified data files using the specified parser using the specified number of folds.

Parameters:
parser - Classification parser for data files.
dataFiles - List of data files to parse.
numFolds - Number of folds.
Throws:
IllegalArgumentException - If the number of folds is less than one.
IOException - If there is an underlying I/O error reading the file or parsing.

XValidatingClassificationCorpus

public XValidatingClassificationCorpus(int numFolds)
Deprecated. 
Construct a cross-validating corpus with the specified number of folds that initially contains no examples.

Parameters:
numFolds - Number of folds for cross-validation.
Throws:
IllegalArgumentException - If the number of folds is less than one.
Method Detail

categories

public String[] categories()
Deprecated. 
Returns the categories found in the cases for this corpus sorted into ascending order.

Returns:
The categories for this corpus.

handle

public void handle(E e,
                   Classification c)
Deprecated. 
Adds the specified object and corresponding classification to the corpus.

Specified by:
handle in interface ClassificationHandler<E,Classification>
Parameters:
e - Object that is classified.
c - Classification for the object.

permuteCorpus

public void permuteCorpus(Random random)
Deprecated. 
Randomly permutes the corpus using the specified randomizer.

Parameters:
random - Randomizer to use for permutation.

numInstances

public int numInstances()
Deprecated. 
Returns the number of instances for this corpus.

Returns:
The number of instances for this corpus.

numFolds

public int numFolds()
Deprecated. 
Returns the number of folds for this corpus.

Returns:
The number of folds for this corpus.

fold

public int fold()
Deprecated. 
Returns the current fold.

Returns:
The current fold.

setFold

public void setFold(int fold)
Deprecated. 
Set the current fold to the specified value.

Parameters:
fold - New fold value.
Throws:
IllegalArgumentException - If the fold is less than zero or greater than or equal to the number of folds.

visitTest

@Deprecated
public void visitTest(ClassificationHandler<E,Classification> handler)
Deprecated. See clas documentation

Sends all of the test cases in this corpus for the current fold to the specified handler.

Overrides:
visitTest in class Corpus<ClassificationHandler<E,Classification>>
Parameters:
handler - Handler to receive the test cases.

visitTrain

@Deprecated
public void visitTrain(ClassificationHandler<E,Classification> handler)
Deprecated. See clas documentation

Send all of the training cases in this corpus for the current fold to the specified handler.

Overrides:
visitTrain in class Corpus<ClassificationHandler<E,Classification>>
Parameters:
handler - Handler to receive the training cases.

toString

public String toString()
Deprecated. 
Returns a string representation of the size of this corpus.

Overrides:
toString in class Object
Returns:
A string representation of the size of this corpus.