com.aliasi.classify
Class ScoredPrecisionRecallEvaluation

java.lang.Object
  extended by com.aliasi.classify.ScoredPrecisionRecallEvaluation

public class ScoredPrecisionRecallEvaluation
extends Object

A ScoredPrecisionRecallEvaluation provides an evaluation of possible precision-recall operating points and other summary statistics The single method addCase(boolean,double) is used to populate the evaluation, with the first argument representing whether the response was correct and the second the score that was assigned.

This evaluation does not consider negative reference cases. If a positive reference case is not scored by a classifier being evaluated, the method addMisses(int) should be used to increment the number of such positive reference cases not scored. See below for more information.

By way of example, consider the following table of cases, all of which involve positive responses. The cases are in rank order, with their scores and whether they were correct listed. For this example, we assume each reference positive result is scored by the classifier.

Rank Score Correct Rec Prec Rej Rec
0-1.21incorrect 0.00 NaN 0.83
1-1.27correct 0.25 0.50 0.83
2-1.39incorrect 0.25 0.33 0.67
3-1.47correct 0.50 0.50 0.67
4-1.60correct 0.75 0.60 0.67
5-1.65incorrect 0.75 0.50 0.50
6-1.79incorrect 0.75 0.43 0.33
7-1.80incorrect 0.75 0.38 0.17
8-2.01correct 1.00 0.44 0.17
9-3.70incorrect 1.00 0.40 0.00
Note that there are four positive reference cases (blue backgrounds) and six negative reference cases (clear backgrounds) in this diagram. By setting an acceptance threshold at the various scores, the precision, recall, and rejection recall values listed in the fourth through sixth columns are derived. For instance, after the rank 0 response, which is wrong, recall is 0/4 = 0.0, because we have retrieved none of the four positive reference cases; similarly, rejection recall is 5/6 = 0.83 because 5/6 of the negative reference cases have been rejected. After the rank 4 response, recall is 3/4 and rejection recall is 2/6.

The pairs of precision/recall values form the basis for the precision-recall curve returned by prCurve(boolean), with the argument indicating whether to perform precision interpolation. For the above graph:

     prCurve(false) = { {0.25, 0.50},
                        {0.50, 0.50},
                        {0.75, 0.60},
                        {1.00, 0.44} }

The pairs of recall/rejection recall values form the basis for the receiver operating characteristic (ROC) curve returned by rocCurve(boolean) with the boolean parameter again indicating whether to perform precision interpolation. For the above graph, the result is:

     rocCurve(false) = { { 0.25, 0.83 },
                         { 0.50, 0.67 },
                         { 0.75, 0.67 },
                         { 1.00, 0.17 } }
Note that for both curves, only the rows corresponding to the correct responses are considered, which are highlighted in blue.

Precision interpolation removes any operating point for which there is a dominant operating point in both dimensions. For the precision-recall curve, the points (0.25,0.50) and (0.50,0.50) are dominated in both dimensions by (0.75,0.60) and so are dropped; the resulting curve is:

     prCurve(true) =  { {0.75, 0.60},
                        {1.00, 0.44} }
This is meant to be read meant to be read as having a constant precision of 0.6 for all recall values between 0 and 0.75 inclusive; thus the interpolation increases values. For the ROC curve, only the three points highlighted in yellow are left:
     rocCurve(true) = { { 0.25, 0.83 },
                        { 0.75, 0.67 },
                        { 1.00, 0.17 } }
Note that the precision interpolated curves always provide strictly decreasing precisions.

The area under the raw precision-recall and ROC curves, with or without interpolation, is computed by the following methods:

     areaUnderPrCurve(false) = (0.25 - 0.00) * 0.50
                             + (0.50 - 0.25) * 0.50
                             + (0.75 - 0.50) * 0.60
                             + (1.00 - 0.75) * 0.44
     = 0.51

     areaUnderPrCurve(true) = (0.75 - 0.00) * 0.60
                            + (1.00 - 0.75) * 0.44
     = 0.56
The ROC areas are computed similarly to yield:
     areaUnderRocCurve(false) = 0.58

     areaUnderRocCurve(true) = 0.58
Note that the precision-interpolated values are always higher.

For precision-recall curves, three additional summary statistics are available. The first provides an average over precision values over the operating points on the uninterpolated precision-recall curve.

      averagePrecision() = (0.50 + 0.50 + 0.60 + 0.44)/4.00
      = 0.51
The second merely returns the maximum Fβ measure for an actual operating point:
      maximumFMeasure() = maximumFMeasure(1) = 0.67
Note that this statistic provides a post-hoc optimal setting for F-measure. Further note that it is based on actual operating points, not interpolations between operating points. The final statistic is the so-called precision-recall breakeven point (BEP). This is computed in the standard way by using the interpolated precision-recall curve. Because the two points of interest are (0.0,0.6) and (0.75,0.6), the best point at which they are equal is (0.6,0.6), and thus:
      prBreakevenPoint() = 0.6
Note that this value will always be less than or equal to the maximum F1-measure.

Given the precision-recall curve, it's possible to compute the precision after any given number of results. The method precisionAt(int) will return the precision after the specified number of results. For instance, in the above graph:

      precisionAt(5)() = 0.6
      precisionAt(10)() = 0.4
Typically results are reported for 5, 10 and 100 when available (counting from 1, not 0). The other information-retrieval style result we return is the reciprocal rank (RR), which is defined to be 1/rank, again counting from 1, not 0. For instance, for the graph above, the first correct answer is the second, so RR is 1/2:
      reciprocalRank()() = 0.5

Missing Cases

The method addMisses(int) provides a mechanism to add counts for items that were missed by the system being evaluated. Any reference true item that is not found by the system and added through addCase(boolean,double) should result in a call to addMisses(1). These calls can be aggregated.

Missing cases will not arise from LingPipe's own classifiers, which always return scores for all results.

Since:
LingPipe2.1
Version:
3.9.2
Author:
Bob Carpenter, Mike Ross

Constructor Summary
ScoredPrecisionRecallEvaluation()
          Construct a scored precision-recall evaluation.
 
Method Summary
 void addCase(boolean correct, double score)
          Add a case with the specified correctness and response score.
 void addMisses(int count)
          Incrments the positive reference count without adding a return case from the classifier.
 double areaUnderPrCurve(boolean interpolate)
          Returns the area under the recall-precision curve with interpolation as specified.
 double areaUnderRocCurve(boolean interpolate)
          Returns the area under the receiver operating characteristic (ROC) curve.
 double averagePrecision()
          Returns point-wise average precision of points on the uninterpolated precision-recall curve.
 double maximumFMeasure()
          Returns the maximum F1-measure for an actual operating point on the uninterpolated precision-recall curve.
 double maximumFMeasure(double beta)
          Returns the maximum Fβ-measure for an actual operating point on the uninterpolated precision-recall curve for a specified β.
 int numCases()
          Returns the number of cases that have been added to this evaluation.
 double prBreakevenPoint()
          Returns the breakeven point (BEP) for precision and recall based on the interpolated precision.
 double[][] prCurve(boolean interpolate)
          Returns the set of recall/precision operating points according to the scores of the cases.
 double precisionAt(int rank)
          Returns the precision score achieved by returning the top scoring documents up to the specified rank.
static void printPrecisionRecallCurve(double[][] prCurve, PrintWriter pw)
          Prints a precision-recall curve with F-measures.
 double[][] prScoreCurve(boolean interpolate)
          Returns the set of recall/precision/score operating points according to the scores of the cases.
 double reciprocalRank()
          Returns the reciprocal rank (RR) for this evaluation.
 double[][] rocCurve(boolean interpolate)
          Returns the receiver operating characteristic (ROC) curve for the cases ordered by score.
 String toString()
          Returns a string-based representation of this scored precision recall evaluation.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ScoredPrecisionRecallEvaluation

public ScoredPrecisionRecallEvaluation()
Construct a scored precision-recall evaluation.

Method Detail

addCase

public void addCase(boolean correct,
                    double score)
Add a case with the specified correctness and response score. Only positive response cases are considered, and the correct flag is set to true if the reference was also positive. The score is just the response score.

Warning: The scores should be sensibly comparable across cases.

Parameters:
correct - true if this case was correct.
score - Score of response.

addMisses

public void addMisses(int count)
Incrments the positive reference count without adding a return case from the classifier. This method is used for precision-recall evaluations where the set of returned items does not enumerate all positive references. These misses are used in calcuating statistics such as precision-recall curves.

Parameters:
count - Number of outright misses to add to this evaluation.

numCases

public int numCases()
Returns the number of cases that have been added to this evaluation.

Returns:
The number of cases for this evaluation.

prCurve

public double[][] prCurve(boolean interpolate)
Returns the set of recall/precision operating points according to the scores of the cases. The method rocCurve(boolean) returns the recall (sensitivity) versus rejection recall (specificity) operating points, which take the number of true negative classifications into account. Note that the recall values (the first component) are strictly increasing, resulting in a well-defined function from recall to precision.

The second operation derives so-called "interpolated precision" and is widely used for evaluating information retrieval systems. The interpolated precision of a given recall point is defined to be the maximum precision for that recall point and any higher recall point. This ensures that precision values are non-increasing with increased recall. For the example above, because 0.60 precision is found at 0.75 recall, the interpolated precision of all recall levels lower than 0.75 is 0.60. This method implements this interpolation by only returning points that are not dominated by other points that have both better precision and recall: In the diagram, these are the yellow highlighted precision points.

It is common to also see this graph completed with points (0,1) and (1,0), but this function does not include these limits. The one hundred percent precision implied by the first point is not necessarily achievable, whereas the second point will be no better than the last point in the return result.

Neither interpolated nor uninterpolated return values are guaranteed to be convex. Convex closure will skew results upward in an even more unrealistic direction, especially if the artificial completion point (0,1) is included.

Parameters:
interpolate - Set to true if the precisions are interpolated through pruning dominated points.
Returns:
The precision-recall curve for the specified category.

prScoreCurve

public double[][] prScoreCurve(boolean interpolate)
Returns the set of recall/precision/score operating points according to the scores of the cases. Other than adding scores, this method works just like prCurve(boolean).

In the example in the class documentation above, the scores are provided in the second column.

Parameters:
interpolate - Set to true if the precisions are interpolated through pruning dominated points.
Returns:
The precision-recall-score curve for the specified category.

rocCurve

public double[][] rocCurve(boolean interpolate)
Returns the receiver operating characteristic (ROC) curve for the cases ordered by score. The ROC curve returns an array of points making up a plot of sensitivity (recall) versus specificity (rejection recall). As usual, recall (sensitivity) is TP/(TP+FN) and rejection recall (sensitivity) is its negative dual TN/(TN+FP), where TP is the true positive count, FP the false positive, FN the false negative and TN the true negative counts. Through sensitivity, the ROC curve provides information about rejection.

The last column in the example in prCurve(boolean) provides the rejection recall rates at each threshold. The resulting ROC curves for that example are: As with the recall-precision curve, the parameter determines whether or not to "interpolate" the rejection recall values. This is carried out as with the recall-precision curve by only returning values which would not be interpolated. In general, without interpolation, the same rows of the table are used as for the recall-precision curve, namely those at the end of a run of true positives. Interpolation may result in a different set of recall points in the pruned answer set, as in the example above.

Like the recall-precision curve method, this method does not insert artificial end ponits of (0,1) and (1,0) into the graph. As with the recall-precision curve, the final entry will have recall equal to one.

Neither interpolated nor uninterpolated return values are guaranteed to be convex. Convex closure will skew results upward in an even more unrealistic direction, especially if the artificial completion point (0,1) is included.

Parameters:
interpolate - If true, any point with both precision and recall lower than another point is eliminated from the returned precision-recall curve.
Returns:
The receiver operating characteristic curve for the specified category.

maximumFMeasure

public double maximumFMeasure()
Returns the maximum F1-measure for an actual operating point on the uninterpolated precision-recall curve. This maximization is based on a post-hoc optimal acceptance threshold. This is derived from the pair on the recall-precision curve yielding the highest value of the F measure, 2*recall*precision/(recall+precision).

For the example in prCurve(boolean):

maximumFMeasure("foo") = 0.67
corresponding to recall=0.75 and precision=0.60.

Returns:
Maximum f-measure for the specified category.

maximumFMeasure

public double maximumFMeasure(double beta)
Returns the maximum Fβ-measure for an actual operating point on the uninterpolated precision-recall curve for a specified β. This maximization is based on a post-hoc optimal acceptance threshold. This is derived from the pair on the recall-precision curve yielding the highest value of the F measure, 2*recall*precision/(recall+precision).

For the example in prCurve(boolean):

maximumFMeasure("foo") = 0.67
corresponding to recall=0.75 and precision=0.60.

Returns:
Maximum f-measure for the specified category.

prBreakevenPoint

public double prBreakevenPoint()
Returns the breakeven point (BEP) for precision and recall based on the interpolated precision. This is the point where the interpolated precision-recall curve has recall equal to precision. Because the interpolation is a step fucntion, the result is different than if two points were linearly interpolated.

For the example illustrated in prCurve(boolean), the breakeven point is 0.60. This is because the interpolated precision recall curve is flat from the implicit initial point (0.00,0.60) to (0.75,0.60) and thus the line between them has a breakeven point of x = y = 0.6.

As an interpolation (equal precision and recall) of a rounded up estimate (interpolated recall-precision curve), the breakeven point is not necessarily an achievable operating point. Note that the recall-precision breakeven point will always be smaller than the maximum F measure, which does correspond to an observed operating point, because the breakeven point always involves lowering the recall of the first point on the curve with recall greater than precision to match the precision.

This method will return 0.0 if the precision-recall curve never crosses the diagonal.

Returns:
The interpolated recall-precision breakeven point.

averagePrecision

public double averagePrecision()
Returns point-wise average precision of points on the uninterpolated precision-recall curve. See prCurve(boolean) for a definition of the values on the curve.

This method implements the standard information retrieval definition, which only averages precision measurements from correct responses.

For the example provided in prCurve(boolean), the average precision is the average of precision values for the correct responses (highlighted lines):

Although the reasoning is different, the average precision returned is the same as the area under the uninterpolated recall-precision graph.

Returns:
Pointwise average precision.

precisionAt

public double precisionAt(int rank)
Returns the precision score achieved by returning the top scoring documents up to the specified rank. The precision-recall curve is not interpolated for this computation. If there are not enough documents, the result Double.NaN is returned.

Returns:
The precision at the specified rank.

reciprocalRank

public double reciprocalRank()
Returns the reciprocal rank (RR) for this evaluation. The reciprocal rank is defined to be the reciprocal of the rank at which the first correct result is retrieved (counting from 1). The return result will be between 1.0 for the first-best result being correct and 0.0, for none of the results being correct.

Typically, the mean of the reciprocal ranks for a number of evaluations is reported.

Returns:
The reciprocal rank.

areaUnderPrCurve

public double areaUnderPrCurve(boolean interpolate)
Returns the area under the recall-precision curve with interpolation as specified. The recall-precision curve is taken to be a step function for the purposes of this calculation, and thus whether precision is interpolated, that is, whether dominated entries are pruned, will affect the area.

For the example detailed in prCurve(boolean), the areas without and with interpolation are: Interpolation will always result in an equal or greater area.

Note that the uninterpolated area under the recall-precision curve is the same as the average precision value.

Parameters:
interpolate - Set to true to interpolate the precision values.
Returns:
The area under the specified precision-recall curve.

areaUnderRocCurve

public double areaUnderRocCurve(boolean interpolate)
Returns the area under the receiver operating characteristic (ROC) curve. The ROC curve is taken to be a step function for the purposes of this calculation, and thus whether rejection recall is interpolated, that is, whether dominated entries are, will affect the area. Interpolation will always result in an equal or greater area.

Parameters:
interpolate - Set to true to interpolate the rejection recall values.
Returns:
The area under the ROC curve.

toString

public String toString()
Returns a string-based representation of this scored precision recall evaluation.

Overrides:
toString in class Object

printPrecisionRecallCurve

public static void printPrecisionRecallCurve(double[][] prCurve,
                                             PrintWriter pw)
Prints a precision-recall curve with F-measures. The curve is formatted as in prCurve(boolean): an array of length-2 arrays of doubles. In each length-2 array, the recall value is at index 0, and the precision is at index 1. The printed curve prints 3 columns in the following order: precision, recall, F-measure.

Parameters:
prCurve - A precision-recall curve.
pw - The output PrintWriter.