com.aliasi.classify
Class ConfusionMatrix

java.lang.Object
  extended by com.aliasi.classify.ConfusionMatrix

public class ConfusionMatrix
extends Object

An instance of ConfusionMatrix represents a quantitative comparison between two classifiers over a fixed set of categories on a number of test cases. For convenience, one classifier is termed the "reference" and the other the "response".

Typically the reference will be determined by a human or other so-called "gold standard", whereas the response will be the result of an automatic classification. This is how confusion matrices are created from test cases in ClassifierEvaluator. With this confusion matrix implementation, two human classifiers or two automatic classifications may also be compared. For instance, human classifiers that label corpora for training sets are often evaluated for inter-annotator agreement; the usual form of reporting for this is the kappa statistic, which is available in three varieties from the confusion matrix. A set of systems may also be compared pairwise, such as those arising from a competitive evaluation.

Confusion matrices may be initialized on construction; with no matrix argument, they will be constructed with zero values in all cells. The values can then be incremented by category name with category name with increment(String,String) or by category index with increment(int,int). There is also a incrementByN(int,int,int) which allows explicit control over matrix values.

Consider the following confusion matrix, which reports on the classification of 27 wines by grape variety. The reference in this case is the true variety and the response arises from the blind evaluation of a human judge.

Many-way Confusion Matrix
  Response
Cabernet Syrah Pinot
Refer-
ence
Cabernet 930
Syrah 351
Pinot 114
Each row represents the results of classifying objects belonging to the category designated by that row. For instance, the first row is the result of 12 cabernet classifications. Reading across, 9 of those cabernets were correctly classified as cabernets, 3 were misclassified as syrahs, and none were misclassified as pinot noir. In the next row are the results for 9 syrahs, 3 of which were misclassified as cabernets and 1 of which was misclassified as a pinot. Similarly, the six pinots being classified are represented on the third row. In total, the classifier categorized 13 wines as cabernets, 9 wines as syrahs, and 5 wines as pinots. The sum of all numbers in the graph is equal to the number of trials, in this case 27. Further note that the correct answers are the ones on the diagonal of the matrix. The individual entries are recoverable using the method count(int,int). The positive and negative counts per category may be recovered from the result of oneVsAll(int).

Collective results are either averaged per category (macro average) or averaged per test case (micro average). The results reported here are for a single operating point of results. Very often in the research literature, results are returned for the best possible post-hoc system settings, established either globally or per category.

The multiple outcome classification can be decomposed into a number of one-versus-all classification problems. For each category, a classifier that categorizes objects as either belonging to that category or not. From an n-way classifier, a one-versus-all classifier can be constructed automatically by treating an object to be classified as belonging to the category if the category is the result of classifying it. For the above three-way confusion matrix, the following three one-versus-all matrices are returned as instances of PrecisionRecallEvaluation through the method oneVsAll(int):

Cab-vs-All
  Response
Cab Other
Refer
-ence
Cab 93
Other 411
Syrah-vs-All
  Response
Syrah Other
Refer
-ence
Syrah 54
Other 414
Pinot-vs-All
  Response
Pinot Other
Refer
-ence
Pinot 42
Other 120
Note that each has the same true-positive number as in the corresponding cell of the original confusion matrix. Further note that the sum of the cells in each derived matrix is the same as in the original matrix. Finally note that if the original classification problem was two dimensional, the derived matrix will be the same as the original matrix. The results of the various precision-recall evaluation methods for these matrices are shown in the class documentation for PrecisionRecallEvaluation.

Macro-averaged results are just the average of the per-category results. These include precision, recall and f-measure. Yule's Q and Y statistics along with the per-category chi squared results are also computed based on the one-versus all matrices.

Micro-averaged results are reported based on another derived matrix: the sum of the scores in the one-versus-all matrices. For the above case, the result given as a PrecisionRecallEvaluation by the method microAverage() is:

Sum of One-vs-All Matrices
  Response
True False
Refer
-ence
True 189
False 945
Note that the true positive cell will be the sum of the true-positive cells of the original matrix (9+5+4=18 in the running example). A little algebra shows that the false positive cell will be equal to the sum of the off-diagonal elements in the original confusion matrix (3+3+1+1+1=9); symmetry then shows that the false negative value will be the same. Finally, the true negative cell will bring the total up to the number of categories times the sum of the entries in the original matrix (here 27*3-18-9-9=45); it is also equal to two times the number of true positives plus the number of false negatives (here 2*18+9=45). Thus for one-versus-all confusion matrices derived from many-way confusion matrices, the micro-averaged precision, recall and f-measure will all be the same.

For the above confusion matrix and derived matrices, the no-argument and category-indexed methods will return the values in the following tables. The hot-linked method documentation defines each statistic in detail.

MethodMethod()
categories() { "Cabernet", "Syrah", "Pinot" }
totalCount() 27
totalCorrect() 18
totalAccuracy() 0.6667
confidence95() 0.1778
confidence99() 0.2341
macroAvgPrecision() 0.6826
macroAvgRecall() 0.6574
macroAvgFMeasure() 0.6676
randomAccuracy() 0.3663
randomAccuracyUnbiased() 0.3663
kappa() 0.4740
kappaUnbiased() 0.4735
kappaNoPrevalence() 0.3333
referenceEntropy() 1.5305
responseEntropy() 1.4865
crossEntropy() 1.5376
jointEntropy() 2.6197
conditionalEntropy() 1.0892
mutualInformation() 0.3973
klDivergence() 0.007129
chiSquaredDegreesOfFreedom() 4
chiSquared() 15.5256
phiSquared() 0.5750
cramersV() 0.5362
lambdaA() 0.4000
lambdaB() 0.3571
Method 0 (Cabernet) 1 (Syrah) 2 (Pinot)
conditionalEntropy(int) 0.81131.35161.2516

Since:
LingPipe2.0
Version:
3.8
Author:
Bob Carpenter

Constructor Summary
ConfusionMatrix(String[] categories)
          Construct a confusion matrix with all zero values from the specified array of categories.
ConfusionMatrix(String[] categories, int[][] matrix)
          Construct a confusion matrix with the specified set of categories and values.
 
Method Summary
 String[] categories()
          Return the array of categories for this confusion matrix.
 double chiSquared()
          Returns Pearson's C2 independence test statistic for this matrix.
 int chiSquaredDegreesOfFreedom()
          Return the number of degrees of freedom of this confusion matrix for the χ2 statistic.
 double conditionalEntropy()
          Returns the conditional entropy of the response distribution against the reference distribution.
 double conditionalEntropy(int refCategoryIndex)
          Returns the entropy of the distribution of categories in the response given that the reference category was as specified.
 double confidence(double z)
          Returns the normal approximation of half of the binomial confidence interval for this confusion matrix for the specified z-score.
 double confidence95()
          Returns half the width of the 95% confidence interval for this confusion matrix.
 double confidence99()
          Returns half the width of the 99% confidence interval for this confusion matrix.
 int count(int referenceCategoryIndex, int responseCategoryIndex)
          Returns the value of the cell in the matrix for the specified reference and response category indices.
 double cramersV()
          Returns the value of Cramér's V statistic for this matrix.
 double crossEntropy()
          The cross-entropy of the response distribution against the reference distribution.
 int getIndex(String category)
          Return the index of the specified category in the list of categories, or -1 if it is not a category for this confusion matrix.
 void increment(int referenceCategoryIndex, int responseCategoryIndex)
          Add one to the cell in the matrix for the specified reference and response category indices.
 void increment(String referenceCategory, String responseCategory)
          Add one to the cell in the matrix for the specified reference and response categories.
 void incrementByN(int referenceCategoryIndex, int responseCategoryIndex, int num)
          Add n to the cell in the matrix for the specified reference and response category indices.
 double jointEntropy()
          Returns the entropy of the joint reference and response distribution as defined by the underlying matrix.
 double kappa()
          Returns the value of the kappa statistic with chance agreement determined by the reference distribution.
 double kappaNoPrevalence()
          Returns the value of the kappa statistic adjusted for prevalence.
 double kappaUnbiased()
          Returns the value of the kappa statistic adjusted for bias.
 double klDivergence()
          Returns the Kullback-Liebler (KL) divergence between the reference and response distributions.
 double lambdaA()
          Returns Goodman and Kruskal's λA index of predictive association.
 double lambdaB()
          Returns Goodman and Kruskal's λB index of predictive association.
 double macroAvgFMeasure()
          Returns the average F measure per category.
 double macroAvgPrecision()
          Returns the average precision per category.
 double macroAvgRecall()
          Returns the average precision per category.
 int[][] matrix()
          Return the matrix values.
 PrecisionRecallEvaluation microAverage()
          Returns the micro-averaged precision-recall evaluation.
 double mutualInformation()
          Returns the mutual information between the reference and response distributions.
 int numCategories()
          Returns the number of categories for this confusion matrix.
 PrecisionRecallEvaluation oneVsAll(int categoryIndex)
          Returns the one-versus-all precision-recall evaluation for the specified category index.
 double phiSquared()
          Returns the value of Pearson's φ2 index of mean square contingency for this matrix.
 double randomAccuracy()
          The expected accuracy from a strategy of randomly guessing categories according to reference and response distributions.
 double randomAccuracyUnbiased()
          The expected accuracy from a strategy of randomly guessing categories according to the average of the reference and response distributions.
 double referenceEntropy()
          The entropy of the decision problem itself as defined by the counts for the reference.
 double responseEntropy()
          The entropy of the response distribution.
 String toString()
          Return a string-based representation of this confusion matrix.
 double totalAccuracy()
          Returns the percentage of response that are correct.
 int totalCorrect()
          Returns the total number of responses that matched the reference.
 int totalCount()
          Returns the total number of classifications.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ConfusionMatrix

public ConfusionMatrix(String[] categories)
Construct a confusion matrix with all zero values from the specified array of categories.

Parameters:
categories - Array of categories for classification.

ConfusionMatrix

public ConfusionMatrix(String[] categories,
                       int[][] matrix)
Construct a confusion matrix with the specified set of categories and values. The values are arranged in reference-category dominant ordering.

For example, the many-way confusion matrix shown in the class documentation above would be initialized as:

 String[] categories = new String[]
     { "Cabernet", "Syrah", "Pinot" };
 int[][] wineTastingScores = new int[][]
     { { 9, 3, 0 },
       { 3, 5, 1 },
       { 1, 1, 4 } };
 ConfusionMatrix matrix
   = new ConfusionMatrix(categories,wineTastingScores);
 

Parameters:
categories - Array of categories for classification.
matrix - Matrix of initial values.
Throws:
IllegalArgumentException - If the categories and matrix do not agree in dimension or the matrix contains a negative value.
Method Detail

categories

public String[] categories()
Return the array of categories for this confusion matrix. The order of categories here is the same as that in the matrix and consistent with that returned by getIndex(). For a category c in the set of categories:
categories()[getIndex(c)].equals(c)
and for an index i in range:
getIndex(categories()[i]) = i

Returns:
The array of categories for this matrix.
See Also:
getIndex(String)

numCategories

public int numCategories()
Returns the number of categories for this confusion matrix. The underlying two-dimensional matrix of counts for this confusion matrix has dimensions equal to the number of categories. Note that numCategories() is guaranteed to be the same as categories().length and thus may be used to compute iteration bounds.

Returns:
The number of categories for this confusion matrix.

getIndex

public int getIndex(String category)
Return the index of the specified category in the list of categories, or -1 if it is not a category for this confusion matrix. The index is the index in the array returned by categories().

Parameters:
category - Category whose index is returned.
Returns:
The index of the specified category in the list of categories.
See Also:
categories()

matrix

public int[][] matrix()
Return the matrix values. All values will be non-negative.

Returns:
The matrix values.

increment

public void increment(int referenceCategoryIndex,
                      int responseCategoryIndex)
Add one to the cell in the matrix for the specified reference and response category indices.

Parameters:
referenceCategoryIndex - Index of reference category.
responseCategoryIndex - Index of response category.
Throws:
IllegalArgumentException - If either index is out of range.

incrementByN

public void incrementByN(int referenceCategoryIndex,
                         int responseCategoryIndex,
                         int num)
Add n to the cell in the matrix for the specified reference and response category indices.

Parameters:
referenceCategoryIndex - Index of reference category.
responseCategoryIndex - Index of response category.
num - Number of instances to increment by.
Throws:
IllegalArgumentException - If either index is out of range.

increment

public void increment(String referenceCategory,
                      String responseCategory)
Add one to the cell in the matrix for the specified reference and response categories.

Parameters:
referenceCategory - Name of reference category.
responseCategory - Name of response category.
Throws:
IllegalArgumentException - If either category is not a category for this confusion matrix.

count

public int count(int referenceCategoryIndex,
                 int responseCategoryIndex)
Returns the value of the cell in the matrix for the specified reference and response category indices.

Parameters:
referenceCategoryIndex - Index of reference category.
responseCategoryIndex - Index of response category.
Returns:
Value of specified cell in the matrix.
Throws:
IllegalArgumentException - If either index is out of range.

totalCount

public int totalCount()
Returns the total number of classifications. This is just the sum of every cell in the matrix:
totalCount() = &Sigmai &Sigmaj count(i,j)

Returns:
The sum of the counts of the entries in the matrix.

totalCorrect

public int totalCorrect()
Returns the total number of responses that matched the reference. This is the sum of counts on the diagonal of the matrix:
totalCorrect() = &Sigmai count(i,i)
The value is the same as that of the microAverage().correctResponse()>

Returns:
The sum of the correct results.

totalAccuracy

public double totalAccuracy()
Returns the percentage of response that are correct. That is:
totalAccuracy() = totalCorrect() / totalCount()
Note that the classification error is just one minus the accuracy, because each answer is either true or false.

Returns:
The percentage of responses that match the reference.

confidence95

public double confidence95()
Returns half the width of the 95% confidence interval for this confusion matrix. Thus the confidence is 95% that the accuracy is the total accuracy plus or minus the return value of this method.

Confidence is determined as described in confidence(double) with parameter z=1.96.

Returns:
Half of the width of the 95% confidence interval.

confidence99

public double confidence99()
Returns half the width of the 99% confidence interval for this confusion matrix. Thus the confidence is 99% that the accuracy is the total accuracy plus or minus the return value of this method.

Confidence is determined as described in confidence(double) with parameter z=2.58.

Returns:
Half of the width of the 99% confidence interval.

confidence

public double confidence(double z)
Returns the normal approximation of half of the binomial confidence interval for this confusion matrix for the specified z-score.

A z score represents the number of standard deviations from the mean, with the following correspondence of z score and percentage confidence intervals:

Z Confidence +/- Z
1.65 90%
1.96 95%
2.58 99%
3.30 99.9%
Thus the z-score for a 95% confidence interval is 1.96 standard deviations. The confidence interval is just the accuracy plus or minus the z score times the standard deviation. To compute the normal approximation to the deviation of the binomial distribution, assume p=totalAccuracy() and n=totalCount(). Then the confidence interval is defined in terms of the deviation of binomial(p,n), which is defined by first taking the variance of the Bernoulli (one trial) distribution with success rate p:
 variance(bernoulli(p)) = p * (1-p)
 
and then dividing by the number n of trials in the binomial distribution to get the variance of the binomial distribution:
 variance(binomial(p,n)) = p * (1-p) / n
 
and then taking the square root to get the deviation:
 dev(binomial(p,n)) = sqrt(p * (1-p) / n)
 
For instance, with p=totalAccuracy()=.90, and n=totalCount()=10000:
dev(binomial(.9,10000)) = sqrt(0.9 * (1.0 - 0.9) / 10000) = 0.003
Thus to determine the 95% confidence interval, we take z = 1.96 for a half-interval width of 1.96 * 0.003 = 0.00588. The resulting interval is just 0.90 +/- 0.00588 or roughly (.894,.906).

Parameters:
z - The z score, or number of standard deviations.
Returns:
Half the width of the confidence interval for the specified number of deviations.

referenceEntropy

public double referenceEntropy()
The entropy of the decision problem itself as defined by the counts for the reference. The entropy of a distribution is the average negative log probability of outcomes. For the reference distribution, this is: referenceEntropy()
    = - Σi referenceLikelihood(i) * log2 referenceLikelihood(i)

referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()

Returns:
The entropy of the reference distribution.

responseEntropy

public double responseEntropy()
The entropy of the response distribution. The entropy of a distribution is the average negative log probability of outcomes. For the response distribution, this is:
responseEntropy()
    = - Σi responseLikelihood(i) * log2 responseLikelihood(i)

responseLikelihood(i) = oneVsAll(i).responseLikelihood()

Returns:
The entropy of the response distribution.

crossEntropy

public double crossEntropy()
The cross-entropy of the response distribution against the reference distribution. The cross-entropy is defined by the negative log probabilities of the response distribution weighted by the reference distribution:
crossEntropy()
    = - Σi referenceLikelihood(i) * log2 responseLikelihood(i)

responseLikelihood(i) = oneVsAll(i).responseLikelihood()
referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
Note that crossEntropy() >= referenceEntropy(). The entropy of a distribution is simply the cross-entropy of the distribution with itself.

Low cross-entropy does not entail good classification, though good classification entails low cross-entropy.

Returns:
The cross-entropy of the response distribution against the reference distribution.

jointEntropy

public double jointEntropy()
Returns the entropy of the joint reference and response distribution as defined by the underlying matrix. Joint entropy is derfined by:
jointEntropy()
= - Σi Σj P'(i,j) * log2 P'(i,j)
P'(i,j) = count(i,j) / totalCount()
and where by convention:
0 log2 0 =def 0

Returns:
Joint entropy of this confusion matrix.

conditionalEntropy

public double conditionalEntropy(int refCategoryIndex)
Returns the entropy of the distribution of categories in the response given that the reference category was as specified. The conditional entropy is defined by:
conditionalEntropy(i)
= - Σj P'(j|i) * log2 P'(j|i)

P'(j|i) = count(j,i) / referenceCount(i)
where

Parameters:
refCategoryIndex - Index of the reference category.
Returns:
Conditional entropy of the category with the specified index.

conditionalEntropy

public double conditionalEntropy()
Returns the conditional entropy of the response distribution against the reference distribution. The conditional entropy is defined to be the sum of conditional entropies per category weighted by the reference likelihood of the category.
conditionalEntropy()
= Σi referenceLikelihood(i) * conditionalEntropy(i)

referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()

Note that this statistic is not symmetric in that if the roles of reference and response are reversed, the answer may be different.

Returns:
The conditional entropy of the response distribution against the reference distribution

kappa

public double kappa()
Returns the value of the kappa statistic with chance agreement determined by the reference distribution. Kappa is defined in terms of total accuracy and random accuracy:
kappa() = (totalAccuracy() - randomAccuracy()) / (1 - randomAccuracy())
The kappa statistic was introduced in:
Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational And Psychological Measurement 20:37-46.

Returns:
Kappa statistic for this confusion matrix.

kappaUnbiased

public double kappaUnbiased()
Returns the value of the kappa statistic adjusted for bias. The unbiased kappa value is defined in terms of total accuracy and a slightly different computation of expected likelihood that averages the reference and response probabilities. The exact definition is:
kappaUnbiased() = (totalAccuracy() - randomAccuracyUnbiased()) / (1 - randomAccuracyUnbiased())
The unbiased version of Kappa was introduced in:
Siegel, Sidney and N. John Castellan, Jr. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.

Returns:
The unbiased version of the kappa statistic.

kappaNoPrevalence

public double kappaNoPrevalence()
Returns the value of the kappa statistic adjusted for prevalence. The definition is:
kappaNoPrevalence() = 2 * totalAccuracy() - 1
The no prevalence version of kappa was introduced in:
Byrt, Ted, Janet Bishop and John B. Carlin. 1993. Bias, prevalence, and kappa. Journal of Clinical Epidemiology 46(5):423-429.
These authors suggest reporting the three kappa statistics defined in this class: kappa, kappa adjusted for prevalence, and kappa adjusted for bias.

Returns:
The value of kappa adjusted for prevalence.

randomAccuracy

public double randomAccuracy()
The expected accuracy from a strategy of randomly guessing categories according to reference and response distributions. This is defined by:
randomAccuracy() = Σi referenceLikelihood(i) * resultLikelihood(i)

referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
responseLikelihood(i) = oneVsAll(i).responseLikelihood()

Returns:
The random accuracy for this matrix.

randomAccuracyUnbiased

public double randomAccuracyUnbiased()
The expected accuracy from a strategy of randomly guessing categories according to the average of the reference and response distributions. This is defined by:
randomAccuracyUnbaised() = Σi ((referenceLikelihood(i) + resultLikelihood(i))/2)2

referenceLikelihood(i) = oneVsAll(i).referenceLikelihood()
responseLikelihood(i) = oneVsAll(i).responseLikelihood()

Returns:
The unbiased random accuracy for this matrix.

chiSquaredDegreesOfFreedom

public int chiSquaredDegreesOfFreedom()
Return the number of degrees of freedom of this confusion matrix for the χ2 statistic. In general, for an n×m matrix, the number of degrees of freedom is equal to (n-1)*(m-1). Because this is a symmetric matrix of dimensions equal to the number of categories, the result is defined to be:
chiSquaredDegreesOfFreedom() = (numCategories() - 1)2

Returns:
The number of degrees of freedom for this confusion matrix.

chiSquared

public double chiSquared()
Returns Pearson's C2 independence test statistic for this matrix. The value is asymptotically χ2 distributed with a number of degrees of freedom as specified by chiSquaredDegreesOfFreedom().

See Statistics.chiSquaredIndependence(double[][]) for definitions of the statistic over matrices.

Returns:
The χ2 statistic for this matrix.

phiSquared

public double phiSquared()
Returns the value of Pearson's φ2 index of mean square contingency for this matrix. The value of φ2 may be defined in terms of χ2 by:
phiSquared() = chiSquared() / totalCount()

As with our other statistics, this is the sample value; the true contingency by the true random variables defining the reference and response.

Returns:
The φ2 statistic for this matrix.

cramersV

public double cramersV()
Returns the value of Cramér's V statistic for this matrix. The square of Cramér's statistic may be defined in terms of the φ2 statistic by:
cramersV() = (phiSquared() / (numCategories()-1))(1/2)

Returns:
The value of Cramér's V statistic for this matrix.

oneVsAll

public PrecisionRecallEvaluation oneVsAll(int categoryIndex)
Returns the one-versus-all precision-recall evaluation for the specified category index. See the class definition above for examples.

Parameters:
categoryIndex - Index of category.
Returns:
The precision-recall evaluation for the category.

microAverage

public PrecisionRecallEvaluation microAverage()
Returns the micro-averaged precision-recall evaluation. This is just the sum of the precision-recall evaluatiosn provided by oneVsAll(int) over all category indices. See the class definition above for an example.

Returns:
The micro-averaged precision-recall evaluation.

macroAvgPrecision

public double macroAvgPrecision()
Returns the average precision per category. This averaging treats each category of being equal in weight. Macro-averaged precision is defined by:
macroAvgPrecision()
= Σi precision(i) / numCategories()

precision(i) = oneVsAll(i).precision()

Returns:
The macro-averaged precision.

macroAvgRecall

public double macroAvgRecall()
Returns the average precision per category. This averaging treats each category as being equal in weight. Macro-averaged recall is defined by:
macroAvgRecall()
= Σi recall(i) / numCategories()

recall(i) = oneVsAll(i).recall()

Returns:
The macro-averaged recall.

macroAvgFMeasure

public double macroAvgFMeasure()
Returns the average F measure per category. This averaging treats each category as being equal in weight. Macro-averaged F measure is defined by:
macroAvgFMeasure()
= Σi fMeasure(i) / numCategories()

recall(i) = oneVsAll(i).fMeasure()

Note that this is not necessarily the same value as results from computing the F-measure from the the macro-averaged precision and macro-averaged recall.

Returns:
The macro-averaged F measure.

lambdaA

public double lambdaA()
Returns Goodman and Kruskal's λA index of predictive association. This is defined by:
lambdaA()
= j maxReferenceCount(j)) - maxReferenceCount()
  / (totalCount() - maxReferenceCount())
where maxReferenceCount(j) is the maximum count in column j of the matrix:
maxReferenceCount(j) = MAXi count(i,j)
and where maxReferenceCount() is the maximum reference count:
maxReferenceCount() = MAXi referenceCount(i)

Note that like conditional probability and conditional entropy, the λA statistic is antisymmetric; the measure λB simply reverses the rows and columns. The probabilistic interpretation of λA is like that of λB, only reversing the role of the reference and response.

Returns:
The λB statistic for this matrix.

lambdaB

public double lambdaB()
Returns Goodman and Kruskal's λB index of predictive association. This is defined by:
lambdaB()
= j maxResponseCount(i)) - maxResponseCount()
  / (totalCount() - maxResponseCount())
where maxResponseCount(i) is the maximum count in row i of the matrix:
maxResponseCount(i) = MAXj count(i,j)
and where maxResponseCount() is the maximum response count:
maxResponseCount() = MAXj responseCount(j)

The probabilistic interpration of λB is the reduction in error likelihood from knowing the specified reference category in predicting the response category. It will thus take on a value between 0.0 and 1.0, with higher values being better. Perfect association yields a value of 1.0 and perfect independence a value of 0.0.

Note that the λB statistic is antisymmetric; the measure λA simply reverses the rows and columns.

Returns:
The λB statistic for this matrix.

mutualInformation

public double mutualInformation()
Returns the mutual information between the reference and response distributions. Mutual information is defined Kullback-Lieblier divergence, between the product of the individual distributions and the joint distribution. Mutual information is defined as:
mutualInformation()
= Σi Σj P(i,j) * log2 ( P(i,j) / (Preference(i) * Presponse(j)) )

P(i,j) = count(i,j) / totalCount()
Preference(i) = oneVsAll(i).referenceLikelihood()
Presponse(i) = oneVsAll(i).responseLikelihood()
A bit of algebra shows that mutual information is the reduction in entropy of the response distribution from knowing the reference distribution:
mutualInformation() = responseEntropy() - conditionalEntropy()
In this way it is similar to the λB measure. And like the λ measures, mutual information is not symmetric. The dual measure here would subtract the condtional entropy from the response entropy.

Returns:
The mutual information between the reference and the response distributions.

klDivergence

public double klDivergence()
Returns the Kullback-Liebler (KL) divergence between the reference and response distributions. KL divergence is also known as relative entropy.
klDivergence()
= Σk Preference(k) * log2 (Preference(k) / Presponse(k))

Preference(i) = oneVsAll(i).referenceLikelihood()
Presponse(i) = oneVsAll(i).responseLikelihood()
Note that KL divergence is not symmetric in the reference and response distributions.

Returns:
the Kullback-Liebler divergence between the reference and response distributions.

toString

public String toString()
Return a string-based representation of this confusion matrix.

Overrides:
toString in class Object
Returns:
A string-based representation of this confusion matrix.