com.aliasi.classify
Class ScoredPrecisionRecallEvaluation

java.lang.Object
  extended by com.aliasi.classify.ScoredPrecisionRecallEvaluation

public class ScoredPrecisionRecallEvaluation
extends Object

A ScoredPrecisionRecallEvaluation provides an evaluation based on the precision-recall operating points and sensitivity-specificity operating points. The unscored precision-recall evaluation class is PrecisionRecallEvaluation.

Construction and Population

There is a single no-arg constructor ScoredPrecisionRecallEvaluation().

The method addCase(boolean,double) is used to populate the evaluation, with the first argument representing whether the response was correct and the second the score that was assigned.

Missing Cases

If there are positive reference cases that are not added through addCase(), the total number of such cases should be added using the method addMisses(int). This method effectively increments the number of reference positive cases used to compute recall values.

If there are negative reference cases that are not dealt with through addCase(), the method addNegativeMisses(int) should be called with the total number of such cases as an argument. This method increments the number of reference engative cases used to compute specificity values.

Example

By way of example, consider the following table of cases, all of which involve positive responses. The cases are in rank order, but may be added in any order.

Rank Score Correct TP TN FP FN Rec Prec Spec F Meas
(-1)n/an/a 0 6 0 5 0.00 1.00 1.00 0.00
0-1.21no 0 5 1 5 0.00 0.00 0.83 0.00
1-1.27yes 1 5 1 4 0.20 0.50 0.83 0.29
2-1.39no 1 4 2 4 0.20 0.33 0.67 0.25
3-1.47yes 2 4 2 3 0.40 0.50 0.67 0.44
4-1.60yes 3 4 2 2 0.60 0.60 0.67 0.60
5-1.65no 3 3 3 2 0.60 0.50 0.50 0.55
6-1.79no 3 2 4 2 0.60 0.43 0.33 0.50
7-1.80no 3 1 5 2 0.60 0.38 0.17 0.47
8-2.01yes 4 1 5 1 0.80 0.44 0.17 0.53
9-3.70no 4 0 6 1 0.80 0.40 0.00 0.53
?n/ayes 5 0 6 0 1.00 0.00 0.00 0.00
The first line, which is separated, indicates the values before any results have been returned. There's no score corresponding to this operating point, and given that it doesn't correspond to a result, correctness is not applicable. It has zero recall, one specificity, and one precision (letting zero divided by zero be one here).

The next lines, listed as ranks 0 to 9, correspond to calls to addCase() with the specified score and correctness. For each of these lines, we list the corresponding number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). These are followed by recall, precision and specificity (aka rejection recall). See the class documentation for PrecisionRecallEvaluation for definitions of these values in terms of the TP, TN, FP, and FN counts.

There are five positive reference cases (blue backgrounds) and six negative reference cases (clear backgrounds) in this diagram. The yellow precision values and orange specificity values are used for interpolated curves.

Precision-Recall Curves

The pairs of precision/recall values form the basis for the precision-recall curve returned by prCurve(boolean), with the argument indicating whether to perform precision interpolation. For the above graph, the uninterpolated precision-recall curve is:

 prCurve(false) = {
     { 0.00, 1.00 },
     { 0.20, 0.50 },
     { 0.20, 0.33 },
     { 0.40, 0.50 },
     { 0.60, 0.60 },
     { 0.60, 0.50 },
     { 0.60, 0.43 },
     { 0.60, 0.38 },
     { 0.60, 0.38 },
     { 0.80, 0.44 },
     { 0.80, 0.40 },
     { 1.00, 0.00 }
 }
Typically, a form of interpolation is performed that sets the precision for a given recall value to the maximum of the precision at the curent or greater recall value. This pushes the yellow precision values up the graph. At the same time, we only return values that correspond to jumps in recall, corresponding to ranks at which true positives were returned. For the example above, the result is
 prCurve(true) = {
     { 0.00, 1.00 },
     { 0.20, 0.60 },
     { 0.40, 0.60 },
     { 0.60, 0.60 },
     { 0.80, 0.44 },
     { 1.00, 0.00 }
 }
For convenience, the evaluation always adds the two limit points, one with precision 0 and recall 1, and one with precision 1 and recall 0. These operating points are always achievable, the first by returning every possible answer, and the second by returning no answers.

ROC Curves

Another popular graph for visualizing classification results is the receiver operating characteristic (ROC) curve, which plots sensitivity (a.k.a. recall) versus one minus specificity (a.k.a. one minus rejection recall). Because specificity is accuracy on negative cases and sensitivity accuracy on positive cases, these graphs are fairly easy to interpret. The precision-recall curve, on the othe hand, does not consider true negative (TN) counts.

The ROC curve is returned by the method rocCurve(boolean) with the boolean parameter again indicating whether to perform precision interpolation. For the above graph, the result is:

 rocCurve(false) = {
     { 1 - 1.00, 0.00 },
     { 1 - 0.83, 0.00 },
     { 1 - 0.83, 0.20 },
     { 1 - 0.67, 0.20 },
     { 1 - 0.67, 0.40 },
     { 1 - 0.67, 0.60 },
     { 1 - 0.50, 0.60 },
     { 1 - 0.33, 0.60 },
     { 1 - 0.17, 0.60 },
     { 1 - 0.17, 0.80 },
     { 1 - 0.00, 0.80 },
     { 1 - 0.00, 1.00 }
 }
Interpolation works exactly the same way as for the precision-recall curves, but based on specificity rather than precsion.
 rocCurve(true) = {
     { 1 - 1.00, 0.00 },
     { 1 - 0.83, 0.20 },
     { 1 - 0.67, 0.60 },
     { 1 - 0.50, 0.60 },
     { 1 - 0.33, 0.60 },
     { 1 - 0.17, 0.80 },
     { 1 - 0.00, 1.00 }
 }

Precision at N

In some information extraction or retrieval tasks, a system might only return a fixed number of examples to a user. To evaluate the result of such truncated result sets, it is common to report the precision after N returned results. The counting starts from one rather than zero for returned results, but we fill in a limiting value of 1.0 for precision at 0. In our running example, we have

      precisionAt(0) = 1.0
      precisionAt(1) = 0.0
      precisionAt(5) = 0.6
      precisionAt(10) = 0.4
      precisionAt(20) = 0.2
      precisionAt(100) = 0.04
The return value for a rank greater than the number of cases added will be calculated assuming all other results are errors.

Reciprocal Rank

For information extraction tasks, one result is often enough to satisfy an information need. A popular measure to characterize this situation is reciprocal rank. The reciprocal rank is defined to be 1/M, where M is the rank (counting from 1) of the first true positive return. In our running example, the first result is a false positive and the second a true positive, so reciprocal rank is
      reciprocalRank()() = 0.5
Note that this measure emphasizes differences in early ranks much more than later ones. For instance, the reciprocal rank for a system returning a correct result first is 1/1, but for one returning it second, it's 1/2, and for one returning the first true positive at rank 10, it's 1/10. The difference between rank 1 and 2 is greater than that between 2 and 10.

R Precision

The R precision is defined as the precision for the first R results, where R is the number of reference positive cases. If there are not enough results, the value returned is calculated by assuming all the non-added results are errors.

For the running example, R precision is

 rPrecision() = 0.6
R precision will always be at a point where precision equals recall. It is also known as the precision-recall break-even point (BEP), and for convenience, there is a method of that name,
 prBreakevenPoint() = rPrecision() = 0.6

Maximum F Measure

Another commonly reported statistic that may be calculated from the precisino-recall curve is the maximum F measure (see PrecisionRecallEvaluation.fMeasure(double,double,double) for a definition of F measure). The result is the maximum F measure value achieved at any position on the curve. For our example, this arises at
 maximumFMeasure() = 0.6

In general, the maximum F measure may occur at a point other than the precision-recall break-even point.

Averaged Results

If there is more than one classifier or information extractor being evaluated, it is common to report averages of several of the statistics reported by this class. LingPipe does not compute these values, but they are easy to calculate by accumulating results for individual ranked precision-recall evaluations.

The average across multiple evaluations of average precision is somewhat misleadingly called mean average precision (MAP) [it should be average average precision, because averages are over finite samples and means are properties of distributions].

The eleven-point precision-recall curves, reciprocal rank, and R precision are also popular targets for reporting averaged results.

References

For texts on ROC and PR evaluations, see the following.

Since:
LingPipe2.1
Version:
4.0.1
Author:
Bob Carpenter, Mike Ross, Breck Baldwin

Constructor Summary
ScoredPrecisionRecallEvaluation()
          Construct a scored precision-recall evaluation.
 
Method Summary
 void addCase(boolean correct, double score)
          Add a case with the specified correctness and response score.
 void addMisses(int count)
          Incrments the positive reference count without adding a case from the classifier.
 void addNegativeMisses(int count)
          Incrments the negative reference count without adding a case from the classifier.
 double areaUnderPrCurve(boolean interpolate)
          Returns the area under the curve (AUC) for the recall-precision curve with interpolation as specified.
 double areaUnderRocCurve(boolean interpolate)
          Returns the area under the receiver operating characteristic (ROC) curve.
 double averagePrecision()
          Returns the average of precisions at the true positive results.
 double[] elevenPtInterpPrecision()
          Returns the interpolated precision at eleven recall points evenly spaced between 0 and 1.
 double maximumFMeasure()
          Returns the maximum F1-measure for an operating point on the PR curve.
 double maximumFMeasure(double beta)
          Returns the maximum Fβ-measure for an operating point on the precision-recall curve for a specified precision weight β > 0.
 int numCases()
          Returns the total number of positive and negative reference cases for this evaluation.
 int numNegativeRef()
          Return the number of negative reference cases.
 int numPositiveRef()
          Returns the number of positive reference cases.
 double prBreakevenPoint()
           
 double[][] prCurve(boolean interpolate)
          Returns the precision-recall curve, interpolating if the specified flag is true.
 double precisionAt(int rank)
          Returns the precision score achieved by returning the top scoring documents up to (but not including) the specified rank.
static void printPrecisionRecallCurve(double[][] prCurve, PrintWriter pw)
          Prints a precision-recall curve with F-measures.
static void printScorePrecisionRecallCurve(double[][] prScoreCurve, PrintWriter pw)
          Prints a precision-recall curve with score.
 double[][] prScoreCurve(boolean interpolate)
          Returns the array of recall/precision/score operating points according to the scores of the cases.
 double reciprocalRank()
          Returns the reciprocal rank for this evaluation.
 double[][] rocCurve(boolean interpolate)
          Returns the receiver operating characteristic (ROC) curve for the cases ordered by score, interpolating if the specified flag is true.
 double rPrecision()
          Return the R precision.
 String toString()
          Returns a string-based representation of this scored precision recall evaluation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ScoredPrecisionRecallEvaluation

public ScoredPrecisionRecallEvaluation()
Construct a scored precision-recall evaluation.

Method Detail

addCase

public void addCase(boolean correct,
                    double score)
Add a case with the specified correctness and response score. Only positive response cases are considered, and the correct flag is set to true if the reference was also positive. The score is just the response score.

Warning: The scores should be sensibly comparable across cases.

Parameters:
correct - true if this case was correct.
score - Score of response.

addMisses

public void addMisses(int count)
Incrments the positive reference count without adding a case from the classifier. This method is used for precision-recall evaluations where the set of returned items does not enumerate all positive references. These misses are used in calcuating statistics such as precision-recall curves.

Parameters:
count - Number of outright misses to add to this evaluation.
Throws:
IllegalArgumentException - if the count is not positive.

addNegativeMisses

public void addNegativeMisses(int count)
Incrments the negative reference count without adding a case from the classifier. This method is used for ROC evaluations where the set of returned items does not enumerate all negative references.

Parameters:
count - Number of outright misses to add to this evaluation.
Throws:
IllegalArgumentException - if the count is not positive.

numCases

public int numCases()
Returns the total number of positive and negative reference cases for this evaluation. The return value is the sum of numPositiveRef() and #numNegativeRef().

Returns:
The number of cases for this evaluation.

numPositiveRef

public int numPositiveRef()
Returns the number of positive reference cases. This count includes the number of cases added with flag true plus the number of misses added.

Returns:
Number of positive reference cases.

numNegativeRef

public int numNegativeRef()
Return the number of negative reference cases. The count includes the number of cases added with flag false plus the number of negative misses added.

Returns:
Number of negative reference cases.

rPrecision

public double rPrecision()
Return the R precision. See the class documentation above for a definition. The R-precision operating point has identical precision and recall by definition.

Returns:
The R precision.

elevenPtInterpPrecision

public double[] elevenPtInterpPrecision()
Returns the interpolated precision at eleven recall points evenly spaced between 0 and 1. The recall points are { 0.0, 0.1, 0.2, ..., 1.0 }.

Returns:
Eleven-point interpolated precision.

averagePrecision

public double averagePrecision()
Returns the average of precisions at the true positive results. If an item is missed (i.e., it was added by #addMisses(int), the precision is considered to be zero. (See class documentation for more information.)

Returns:
Average precision at each true positive.

prCurve

public double[][] prCurve(boolean interpolate)
Returns the precision-recall curve, interpolating if the specified flag is true. See the class documentation above for a definition of the curve.

Warning: Despite the name, the values returned are in the arrays with recall at index 0 and precision at index 1.

Parameters:
interpolate - Set to true for precision interpolation.
Returns:
The precision-recall curve.

prScoreCurve

public double[][] prScoreCurve(boolean interpolate)
Returns the array of recall/precision/score operating points according to the scores of the cases. Other than adding scores, this method works just like prCurve(boolean). Index 0 is recall, 1 is precision and 2 is the score.

.

Parameters:
interpolate - Set to true if the precisions are interpolated through pruning dominated points.
Returns:
The precision-recall-score curve for the specified category.

rocCurve

public double[][] rocCurve(boolean interpolate)
Returns the receiver operating characteristic (ROC) curve for the cases ordered by score, interpolating if the specified flag is true. See the class documentation above for a definition and example of the returned curve.

Parameters:
interpolate - Interpolate specificity values.
Returns:
The receiver operating characteristic curve.

maximumFMeasure

public double maximumFMeasure()
Returns the maximum F1-measure for an operating point on the PR curve. See the class documentation above for an example and further explanation.

Returns:
Maximum f-measure for the specified category.

maximumFMeasure

public double maximumFMeasure(double beta)
Returns the maximum Fβ-measure for an operating point on the precision-recall curve for a specified precision weight β > 0.

Returns:
Maximum f-measure for the specified category.

precisionAt

public double precisionAt(int rank)
Returns the precision score achieved by returning the top scoring documents up to (but not including) the specified rank. The precision-recall curve is not interpolated for this computation. For rank 0, the result Double.NaN is returned.

Returns:
The precision at the specified rank.

prBreakevenPoint

public double prBreakevenPoint()

reciprocalRank

public double reciprocalRank()
Returns the reciprocal rank for this evaluation. The reciprocal rank is defined as the reciprocal 1/N of the rank N at which the first true positive is found. This method counts ranks from 1 rather than 0. The return result will be between 1.0 for the first-best result being correct and 0.0, for none of the results being correct.

Returns:
The reciprocal rank.

areaUnderPrCurve

public double areaUnderPrCurve(boolean interpolate)
Returns the area under the curve (AUC) for the recall-precision curve with interpolation as specified. See the class documentation for more information.

Warning: This method uses the parallelogram method for interpolation rather than the usual interpolation method used to calculate AUC for precision-recall in information retrieval evaluations. The usual AUC calculation for PR curves

Parameters:
interpolate - Set to true to interpolate the precision values.
Returns:
The area under the specified precision-recall curve.

areaUnderRocCurve

public double areaUnderRocCurve(boolean interpolate)
Returns the area under the receiver operating characteristic (ROC) curve. See the class documentation for more information.

Parameters:
interpolate - Set to true to interpolate the rejection recall values.
Returns:
The area under the ROC curve.

toString

public String toString()
Returns a string-based representation of this scored precision recall evaluation.

Overrides:
toString in class Object

printPrecisionRecallCurve

public static void printPrecisionRecallCurve(double[][] prCurve,
                                             PrintWriter pw)
Prints a precision-recall curve with F-measures. The curve is formatted as in prCurve(boolean): an array of length-2 arrays of doubles. In each length-2 array, the recall value is at index 0, and the precision is at index 1. The printed curve prints 3 columns in the following order: precision, recall, F-measure.

Parameters:
prCurve - A precision-recall curve.
pw - The output PrintWriter.

printScorePrecisionRecallCurve

public static void printScorePrecisionRecallCurve(double[][] prScoreCurve,
                                                  PrintWriter pw)
Prints a precision-recall curve with score. The curve is formatted as in prScoreCurve(boolean): an array of length-3 arrays of doubles. In each length-3 array, the recall value is at index 0, and the precision is at index 1 and score at 2. The printed curve prints 3 columns in the following order: precision, recall, score.

Parameters:
prScoreCurve - A precision-recall score curve.
pw - The output PrintWriter.