com.aliasi.classify
Class PrecisionRecallEvaluation

java.lang.Object
  extended by com.aliasi.classify.PrecisionRecallEvaluation

public class PrecisionRecallEvaluation
extends Object

A PrecisionRecallEvaluation collects and reports a suite of descriptive statistics for binary classification tasks. The basis of a precision recall evaluation is a matrix of counts of reference and response classifications. Each cell in the matrix corresponds to a method returning a long integer count.

  Response Reference Totals
true false
Refer
-ence
true truePositive() (TP)falseNegative() (FN) positiveReference() (TP+FN)
false falsePositive() (FP)trueNegative() (TN) negativeReference() (FP+TN)
Response TotalspositiveResponse() (TP+FP) negativeResponse() (FN+TN) total() (TP+FN+FP+TN)
The most basic statistic is accuracy, which is the number of correct responses divided by the total number of cases.
accuracy() = correct() / total()
This class derives its name from the following four statistics, which are illustrated in the four tables.
recall() = truePositive() / positiveReference()
precision() = truePositive() / positiveResponse()
rejectionRecall() = trueNegative() / negativeReference()
rejectionPrecision() = trueNegative() / negativeResponse()
Each measure is defined to be the green count divided by the green plus red count in the corresponding table:
Recall Response
True False
Refer
-ence
True +-
False   
Precision Response
True False
Refer
-ence
True + 
False - 
Rejection
Recall
Response
True False
Refer
-ence
True   
False -+
Rejection
Precision
Response
True False
Refer
-ence
True  -
False  +
This picture clearly illustrates the relevant dualities. Precision is the dual to recall if the reference and response are switched (the matrix is transposed). Similarly, rejection recall is dual to recall with true and false labels switched (reflection around each axis in turn); rejection precision is similarly dual to precision.

Precision and recall may be combined by weighted geometric averaging by using the f-measure statistic, with β between 0 and infinity being the relative weight of precision, with 1 being a neutral value.

fMeasure() = fMeasure(1)
fMeasure(β) = (1 + β2) * precision() * recall() / (recall() + β2 * precision())

There are four traditional measures of binary classification, which are as follows.

fowlkesMallows() = truePositive() / (precision() * recall())(1/2)
jaccardCoefficient() = truePositive() / (total() - trueNegative())
yulesQ() = (truePositive() * trueNegative() - falsePositive() * falseNegative()) / (truePositive() * trueNegative() + falsePositive() * falsePositive())
yulesY() = ((truePositive() * trueNegative())(1/2) - (falsePositive() * falseNegative())(1/2))
/ ((truePositive() * trueNegative())(1/2) + (falsePositive() * falsePositive())(1/2))

Replacing precision and recall with their definitions, TP/(TP+FP) and TP/(TP+FN):

      F1
      = 2 * (TP/(TP+FP)) * (TP/(TP+FN)) 
        / (TP/(TP+FP) + TP/(TP+FN))     
      = 2 * (TP*TP / (TP+FP)(TP+FN))
        / (TP*(TP+FN)/(TP+FP)(TP+FN) + TP*(TP+FP)/(TP+FN)(TP+FP))
      = 2 * (TP / (TP+FP)(TP+FN))
        / ((TP+FN)/(TP+FP)(TP+FN) + (TP+FP)/(TP+FN)(TP+FP))
      = 2 * TP / 
        / ((TP+FN) + (TP+FP))
      = 2*TP / (2*TP + FP + FN)
Thus the F1-measure is very closely related to the Jaccard coefficient, TP/(TP+FP+FN). Like the Jaccard coefficient, the F measure does not vary with varying true negative counts. Rejection precision and recall do vary with changes in true negative count.

Basic reference and response likelihoods are computed by frequency.

referenceLikelihood() = positiveReference() / total()
responseLikelihood() = positiveResponse() / total()
An algorithm that chose responses at random according to the response likelihood would have the following accuracy against test cases chosen at random according to the reference likelihood:
randomAccuracy() = referenceLikelihood() * responseLikelihood() + (1 - referenceLikelihood()) * (1 - responseLikelihood())
The two summands arise from the likelihood of true positive and the likelihood of a true negative. From random accuracy, the κ-statistic is defined by dividing out the random accuracy from the accuracy, in some way giving a measure of performance above a baseline expectation.
kappa() = kappa(accuracy(),randomAccuracy())
kappa(p,e) = (p - e) / (1 - e)

There are two alternative forms of the κ-statistic, both of which attempt to correct for putative bias in the estimation of random accuracy. The first involves computing the random accuracy by taking the average of the reference and response likelihoods to be the baseline reference and response likelihood, and squaring the result to get the so-called unbiased random accuracy and the unbiased κ-statistic:

randomAccuracyUnbiased() = avgLikelihood()2 + (1 - avgLikelihood())2
avgLikelihood() = (referenceLikelihood() + responseLikelihood()) / 2
kappaUnbiased() = kappa(accuracy(),randomAccuracyUnbiased())

Kappa can also be adjusted for the prevalence of positive reference cases, which leads to the following simple definition:

kappaNoPrevalence() = (2 * accuracy()) - 1

Pearson's C2 statistic is provided by the following method:

chiSquared() = total() * phiSquared()
phiSquared() = ((truePositive()*trueNegative()) * (falsePositive()*falseNegative()))2
/ ((truePositive()+falseNegative()) * (falsePositive()+trueNegative()) * (truePositive()+falsePositive()) * (falseNegative()+trueNegative()))

The accuracy deviation is the deviation of the average number of positive cases in a binomial distribution with accuracy equal to the classification accuracy and number of trials equal to the total number of cases.

accuracyDeviation() = (accuracy() * (1 - accuracy()) / total())(1/2)
This number can be used to provide error intervals around the accuracy results.

Using the following three tables as examples:

Cab-vs-All
  Response
Cab Other
Refer
-ence
Cab 93
Other 411
Syrah-vs-All
  Response
Syrah Other
Refer
-ence
Syrah 54
Other 414
Pinot-vs-All
  Response
Pinot Other
Refer
-ence
Pinot 42
Other 120
The various statistics evaluate to the following values:
Method Cabernet Syrah Pinot
positiveReference() 1296
negativeReference() 151821
positiveResponse() 1395
negativeResponse() 141822
correctResponse() 201924
total() 272727
accuracy() 0.74070.70370.8889
recall() 0.75000.55550.6666
precision() 0.69230.55550.8000
rejectionRecall() 0.73330.77780.9524
rejectionPrecision() 0.78580.77780.9091
fMeasure() 0.72000.55550.7272
fowlkesMallows() 12.499.005.48
jaccardCoefficient() 0.56250.38460.5714
yulesQ() 0.78380.62790.9512
yulesY() 0.48350.35310.7269
referenceLikelihood() 0.44440.33330.2222
responseLikelihood() 0.48150.33330.1852
randomAccuracy() 0.50210.55560.6749
kappa() 0.47920.33330.6583
randomAccuracyUnbiased() 0.50270.55560.6756
kappaUnbiased() 0.47890.33330.6575
kappaNoPrevalence() 0.48140.40740.7778
chiSquared() 6.23823.000011.8519
phiSquared() 0.23100.11110.4390
accuracyDeviation() 0.08430.08790.0605

Since:
LingPipe2.1
Version:
2.1
Author:
Bob Carpenter

Constructor Summary
PrecisionRecallEvaluation()
          Construct a precision-recall evaluation with all counts set to zero.
PrecisionRecallEvaluation(long tp, long fn, long fp, long tn)
          Construction a precision-recall evaluation initialized with the specified counts.
 
Method Summary
 double accuracy()
          Returns the sample accuracy of the responses.
 double accuracyDeviation()
          Returns the standard deviation of the accuracy.
 void addCase(boolean reference, boolean response)
          Adds a case with the specified reference and response classifications.
 double chiSquared()
          Returns the χ2 value.
 long correctResponse()
          Returns the number of cases where the response is correct.
 long falseNegative()
          Returns the number of false negative cases.
 long falsePositive()
          Returns the number of false positive cases.
 double fMeasure()
          Returns the F1 measure.
 double fMeasure(double beta)
          Returns the Fβ value for the specified β.
static double fMeasure(double beta, double recall, double precision)
          Returns the Fβ measure for a specified β, recall and precision values.
 double fowlkesMallows()
          Return the Fowlkes-Mallows score.
 long incorrectResponse()
          Returns the number of cases where the response is incorrect.
 double jaccardCoefficient()
          Returns the Jaccard coefficient.
 double kappa()
          Returns the value of the kappa statistic.
 double kappaNoPrevalence()
          Returns the value of the kappa statistic adjusted for prevalence.
 double kappaUnbiased()
          Returns the value of the unbiased kappa statistic.
 long negativeReference()
          Returns the number of negative reference cases.
 long negativeResponse()
          Returns the number of negative response cases.
 double phiSquared()
          Returns the φ2 value.
 long positiveReference()
          Returns the number of positive reference cases.
 long positiveResponse()
          Returns the number of positive response cases.
 double precision()
          Returns the precision.
 double randomAccuracy()
          The probability that the reference and response are the same if they are generated randomly according to the reference and response likelihoods.
 double randomAccuracyUnbiased()
          The probability that the reference and the response are the same if the reference and response likelihoods are both the average of the sample reference and response likelihoods.
 double recall()
          Returns the recall.
 double referenceLikelihood()
          Returns the sample reference likelihood, which is the number of positive references divided by the total number of cases.
 double rejectionPrecision()
          Returns the rejection prection, or selectivity, value.
 double rejectionRecall()
          Returns the rejection recall, or specificity, value.
 double responseLikelihood()
          Returns the sample response likelihood, which is the number of positive responses divided by the total number of cases.
 String toString()
          Returns a string-based representation of this evaluation.
 long total()
          Returns the total number of cases.
 long trueNegative()
          Returns the number of true negative cases.
 long truePositive()
          Returns the number of true positive cases.
 double yulesQ()
          Return the value of Yule's Q statistic.
 double yulesY()
          Return the value of Yule's Y statistic.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

PrecisionRecallEvaluation

public PrecisionRecallEvaluation()
Construct a precision-recall evaluation with all counts set to zero.


PrecisionRecallEvaluation

public PrecisionRecallEvaluation(long tp,
                                 long fn,
                                 long fp,
                                 long tn)
Construction a precision-recall evaluation initialized with the specified counts.

Parameters:
tp - True positive count.
fn - False negative count.
fp - False positive count.
tn - True negative count.
Throws:
IllegalArgumentException - If any of the counts are negative.
Method Detail

addCase

public void addCase(boolean reference,
                    boolean response)
Adds a case with the specified reference and response classifications.

Parameters:
reference - Reference classification.
response - Response classification.

truePositive

public long truePositive()
Returns the number of true positive cases. A true positive is where both the reference and response are true.

Returns:
The number of true positives.

falsePositive

public long falsePositive()
Returns the number of false positive cases. A false positive is where the reference is false and response is true.

Returns:
The number of false positives.

trueNegative

public long trueNegative()
Returns the number of true negative cases. A true negative is where both the reference and response are false.

Returns:
The number of true negatives.

falseNegative

public long falseNegative()
Returns the number of false negative cases. A false negative is where the reference is true and response is false.

Returns:
The number of false negatives.

positiveReference

public long positiveReference()
Returns the number of positive reference cases. A positive reference case is one where the reference is true.

Returns:
The number of positive references.

negativeReference

public long negativeReference()
Returns the number of negative reference cases. A negative reference case is one where the reference is false.

Returns:
The number of negative references.

referenceLikelihood

public double referenceLikelihood()
Returns the sample reference likelihood, which is the number of positive references divided by the total number of cases.

Returns:
The sample reference likelihood.

positiveResponse

public long positiveResponse()
Returns the number of positive response cases. A positive response case is one where the response is true.

Returns:
The number of positive responses.

negativeResponse

public long negativeResponse()
Returns the number of negative response cases. A negative response case is one where the response is false.

Returns:
The number of negative responses.

responseLikelihood

public double responseLikelihood()
Returns the sample response likelihood, which is the number of positive responses divided by the total number of cases.

Returns:
The sample response likelihood.

correctResponse

public long correctResponse()
Returns the number of cases where the response is correct. A correct response is one where the reference and response are the same.

Returns:
The number of correct responses.

incorrectResponse

public long incorrectResponse()
Returns the number of cases where the response is incorrect. An incorrect response is one where the reference and response are different.

Returns:
The number of incorrect responses.

total

public long total()
Returns the total number of cases.

Returns:
The total number of cases.

accuracy

public double accuracy()
Returns the sample accuracy of the responses. The accuracy is just the number of correct responses divided by the total number of respones.

Returns:
The sample accuracy.

recall

public double recall()
Returns the recall. The recall is the number of true positives divided by the number of positive references. This is the fraction of positive reference cases that were found by the classifier.

Returns:
The recall value.

precision

public double precision()
Returns the precision. The precision is the number of true positives divided by the number of positive respones. This is the fraction of positive responses returned by the classifier that were correct.

Returns:
The precision value.

rejectionRecall

public double rejectionRecall()
Returns the rejection recall, or specificity, value. The rejection recall is the percentage of negative references that had negative respones.

Returns:
The rejection recall value.

rejectionPrecision

public double rejectionPrecision()
Returns the rejection prection, or selectivity, value. The rejection precision is the percentage of negative responses that were negative references.

Returns:
The rejection precision value.

fMeasure

public double fMeasure()
Returns the F1 measure. This is the result of applying the method fMeasure(double) to 1. of the method

Returns:
The F1 measure.

fMeasure

public double fMeasure(double beta)
Returns the Fβ value for the specified β.

Parameters:
beta - The β parameter.
Returns:
The Fβ value.

jaccardCoefficient

public double jaccardCoefficient()
Returns the Jaccard coefficient.

Returns:
The Jaccard coefficient.

chiSquared

public double chiSquared()
Returns the χ2 value.

Returns:
The χ2 value.

phiSquared

public double phiSquared()
Returns the φ2 value.

Returns:
The φ2 value.

yulesQ

public double yulesQ()
Return the value of Yule's Q statistic.

Returns:
The value of Yule's Q statistic.

yulesY

public double yulesY()
Return the value of Yule's Y statistic.

Returns:
The value of Yule's Y statistic.

fowlkesMallows

public double fowlkesMallows()
Return the Fowlkes-Mallows score.

Returns:
The Fowlkes-Mallows score.

accuracyDeviation

public double accuracyDeviation()
Returns the standard deviation of the accuracy. This is computed as the deviation of an equivalent accuracy generated by a binomial distribution, which is just a sequence of Bernoulli (binary) trials.

Returns:
The standard deviation of the accuracy.

randomAccuracy

public double randomAccuracy()
The probability that the reference and response are the same if they are generated randomly according to the reference and response likelihoods.

Returns:
The accuracy of a random classifier.

randomAccuracyUnbiased

public double randomAccuracyUnbiased()
The probability that the reference and the response are the same if the reference and response likelihoods are both the average of the sample reference and response likelihoods.

Returns:
The unbiased random accuracy.

kappa

public double kappa()
Returns the value of the kappa statistic.

Returns:
The value of the kappa statistic.

kappaUnbiased

public double kappaUnbiased()
Returns the value of the unbiased kappa statistic.

Returns:
The value of the unbiased kappa statistic.

kappaNoPrevalence

public double kappaNoPrevalence()
Returns the value of the kappa statistic adjusted for prevalence.

Returns:
The value of the kappa statistic adjusted for prevalence.

toString

public String toString()
Returns a string-based representation of this evaluation.

Overrides:
toString in class Object
Returns:
A string-based representation of this evaluation.

fMeasure

public static double fMeasure(double beta,
                              double recall,
                              double precision)
Returns the Fβ measure for a specified β, recall and precision values.

Parameters:
beta - Relative weighting of precision.
recall - Recall value.
precision - Precision value.
Returns:
The Fβ measure.