public class ScoredPrecisionRecallEvaluation extends Object
ScoredPrecisionRecallEvaluation
provides an
evaluation based on the precision-recall operating points and
sensitivity-specificity operating points. The unscored
precision-recall evaluation class is PrecisionRecallEvaluation
.
There is a single no-arg constructor ScoredPrecisionRecallEvaluation()
.
The method addCase(boolean,double)
is used to populate
the evaluation, with the first argument representing whether the
response was correct and the second the score that was assigned.
If there are positive reference cases that are not added through
addCase()
, the total number of such cases should be added
using the method addMisses(int)
. This method effectively
increments the number of reference positive cases used to compute
recall values.
If there are negative reference cases that are not dealt with
through addCase()
, the method addNegativeMisses(int)
should be called with the total number of
such cases as an argument. This method increments the number of
reference engative cases used to compute specificity values.
By way of example, consider the following table of cases, all of which involve positive responses. The cases are in rank order, but may be added in any order.
The first line, which is separated, indicates the values before any results have been returned. There's no score corresponding to this operating point, and given that it doesn't correspond to a result, correctness is not applicable. It has zero recall, one specificity, and one precision (letting zero divided by zero be one here).
Rank Score Correct TP TN FP FN Rec Prec Spec F Meas (-1) n/a n/a 0 6 0 5 0.00 1.00 1.00 0.00 0 -1.21 no 0 5 1 5 0.00 0.00 0.83 0.00 1 -1.27 yes 1 5 1 4 0.20 0.50 0.83 0.29 2 -1.39 no 1 4 2 4 0.20 0.33 0.67 0.25 3 -1.47 yes 2 4 2 3 0.40 0.50 0.67 0.44 4 -1.60 yes 3 4 2 2 0.60 0.60 0.67 0.60 5 -1.65 no 3 3 3 2 0.60 0.50 0.50 0.55 6 -1.79 no 3 2 4 2 0.60 0.43 0.33 0.50 7 -1.80 no 3 1 5 2 0.60 0.38 0.17 0.47 8 -2.01 yes 4 1 5 1 0.80 0.44 0.17 0.53 9 -3.70 no 4 0 6 1 0.80 0.40 0.00 0.53 ? n/a yes 5 0 6 0 1.00 0.00 0.00 0.00
The next lines, listed as ranks 0 to 9, correspond to calls to
addCase()
with the specified score and correctness. For
each of these lines, we list the corresponding number of true
positives (TP), true negatives (TN), false positives (FP), and
false negatives (FN). These are followed by recall, precision and
specificity (aka rejection recall). See the class documentation
for PrecisionRecallEvaluation
for definitions of these
values in terms of the TP, TN, FP, and FN counts.
There are five positive reference cases (blue backgrounds) and six negative reference cases (clear backgrounds) in this diagram. The yellow precision values and orange specificity values are used for interpolated curves.
The pairs of precision/recall values form the basis for the
precision-recall curve returned by prCurve(boolean)
, with
the argument indicating whether to perform precision interpolation.
For the above graph, the uninterpolated precision-recall curve is:
Typically, a form of interpolation is performed that sets the precision for a given recall value to the maximum of the precision at the curent or greater recall value. This pushes the yellow precision values up the graph. At the same time, we only return values that correspond to jumps in recall, corresponding to ranks at which true positives were returned. For the example above, the result isprCurve(false) = { { 0.00, 1.00 }, { 0.20, 0.50 }, { 0.20, 0.33 }, { 0.40, 0.50 }, { 0.60, 0.60 }, { 0.60, 0.50 }, { 0.60, 0.43 }, { 0.60, 0.38 }, { 0.60, 0.38 }, { 0.80, 0.44 }, { 0.80, 0.40 }, { 1.00, 0.00 } }
For convenience, the evaluation always adds the two limit points, one with precision 0 and recall 1, and one with precision 1 and recall 0. These operating points are always achievable, the first by returning every possible answer, and the second by returning no answers.prCurve(true) = { { 0.00, 1.00 }, { 0.20, 0.60 }, { 0.40, 0.60 }, { 0.60, 0.60 }, { 0.80, 0.44 }, { 1.00, 0.00 } }
The ROC curve is returned by the method rocCurve(boolean)
with the boolean parameter again indicating
whether to perform precision interpolation. For the above graph,
the result is:
Interpolation works exactly the same way as for the precision-recall curves, but based on specificity rather than precsion.rocCurve(false) = { { 1 - 1.00, 0.00 }, { 1 - 0.83, 0.00 }, { 1 - 0.83, 0.20 }, { 1 - 0.67, 0.20 }, { 1 - 0.67, 0.40 }, { 1 - 0.67, 0.60 }, { 1 - 0.50, 0.60 }, { 1 - 0.33, 0.60 }, { 1 - 0.17, 0.60 }, { 1 - 0.17, 0.80 }, { 1 - 0.00, 0.80 }, { 1 - 0.00, 1.00 } }
rocCurve(true) = { { 1 - 1.00, 0.00 }, { 1 - 0.83, 0.20 }, { 1 - 0.67, 0.60 }, { 1 - 0.50, 0.60 }, { 1 - 0.33, 0.60 }, { 1 - 0.17, 0.80 }, { 1 - 0.00, 1.00 } }
In some information extraction or retrieval tasks, a system might only return a fixed number of examples to a user. To evaluate the result of such truncated result sets, it is common to report the precision after N returned results. The counting starts from one rather than zero for returned results, but we fill in a limiting value of 1.0 for precision at 0. In our running example, we have
precisionAt(0) = 1.0 precisionAt(1) = 0.0 precisionAt(5) = 0.6 precisionAt(10) = 0.4 precisionAt(20) = 0.2 precisionAt(100) = 0.04The return value for a rank greater than the number of cases added will be calculated assuming all other results are errors.
1/M
, where M
is the
rank (counting from 1) of the first true positive return. In our
running example, the first result is a false positive and the
second a true positive, so reciprocal rank is
reciprocalRank()() = 0.5Note that this measure emphasizes differences in early ranks much more than later ones. For instance, the reciprocal rank for a system returning a correct result first is 1/1, but for one returning it second, it's 1/2, and for one returning the first true positive at rank 10, it's 1/10. The difference between rank 1 and 2 is greater than that between 2 and 10.
For the running example, R precision is
R precision will always be at a point where precision equals recall. It is also known as the precision-recall break-even point (BEP), and for convenience, there is a method of that name,rPrecision() = 0.6
prBreakevenPoint() = rPrecision() = 0.6
PrecisionRecallEvaluation.fMeasure(double,double,double)
for a
definition of F measure). The result is the maximum F measure
value achieved at any position on the curve. For our example, this
arises at
maximumFMeasure() = 0.6
In general, the maximum F measure may occur at a point other than the precision-recall break-even point.
The average across multiple evaluations of average precision is somewhat misleadingly called mean average precision (MAP) [it should be average average precision, because averages are over finite samples and means are properties of distributions].
The eleven-point precision-recall curves, reciprocal rank, and R precision are also popular targets for reporting averaged results.
Modifier and Type | Field and Description |
---|---|
static double |
FLOATING_POINT_EQUALS_EPSILON |
Constructor and Description |
---|
ScoredPrecisionRecallEvaluation()
Construct a scored precision-recall evaluation.
|
Modifier and Type | Method and Description |
---|---|
void |
addCase(boolean correct,
double score)
Add a case with the specified correctness and response score.
|
void |
addMisses(int count)
Incrments the positive reference count without adding a
case from the classifier.
|
void |
addNegativeMisses(int count)
Incrments the negative reference count without adding a case
from the classifier.
|
double |
areaUnderPrCurve(boolean interpolate)
Returns the area under the curve (AUC) for the recall-precision
curve with interpolation as specified.
|
double |
areaUnderRocCurve(boolean interpolate)
Returns the area under the receiver operating characteristic
(ROC) curve.
|
double |
averagePrecision()
Returns the average of precisions at the true positive
results.
|
double[] |
elevenPtInterpPrecision()
Returns the interpolated precision at eleven recall points
evenly spaced between 0 and 1.
|
double |
maximumFMeasure()
Returns the maximum F1-measure for an
operating point on the PR curve.
|
double |
maximumFMeasure(double beta)
Returns the maximum Fβ-measure for
an operating point on the precision-recall curve for a
specified precision weight
β > 0 . |
int |
numCases()
Returns the total number of positive and negative reference
cases for this evaluation.
|
int |
numNegativeRef()
Return the number of negative reference cases.
|
int |
numPositiveRef()
Returns the number of positive reference cases.
|
double |
prBreakevenPoint() |
double[][] |
prCurve(boolean interpolate)
Returns the precision-recall curve, interpolating if
the specified flag is true.
|
double |
precisionAt(int rank)
Returns the precision score achieved by returning the top
scoring documents up to (but not including) the specified rank.
|
static void |
printPrecisionRecallCurve(double[][] prCurve,
PrintWriter pw)
Prints a precision-recall curve with F-measures.
|
static void |
printScorePrecisionRecallCurve(double[][] prScoreCurve,
PrintWriter pw)
Prints a precision-recall curve with score.
|
double[][] |
prScoreCurve(boolean interpolate)
Returns the array of recall/precision/score operating points
according to the scores of the cases.
|
double |
reciprocalRank()
Returns the reciprocal rank for this evaluation.
|
double[][] |
rocCurve(boolean interpolate)
Returns the receiver operating characteristic (ROC) curve for
the cases ordered by score, interpolating if the specified flag
is
true . |
double |
rPrecision()
Return the R precision.
|
String |
toString()
Returns a string-based representation of this scored precision
recall evaluation.
|
public static final double FLOATING_POINT_EQUALS_EPSILON
public ScoredPrecisionRecallEvaluation()
public void addCase(boolean correct, double score)
true
if the reference was also
positive. The score is just the response score.
Warning: The scores should be sensibly comparable across cases.
correct
- true
if this case was correct.score
- Score of response.public void addMisses(int count)
count
- Number of outright misses to add to
this evaluation.IllegalArgumentException
- if the count is not positive.public void addNegativeMisses(int count)
count
- Number of outright misses to add to
this evaluation.IllegalArgumentException
- if the count is not positive.public int numCases()
numPositiveRef()
and #numNegativeRef()
.public int numPositiveRef()
true
plus
the number of misses added.public int numNegativeRef()
false
plus
the number of negative misses added.public double rPrecision()
public double[] elevenPtInterpPrecision()
public double averagePrecision()
#addMisses(int)
, the precision is considered
to be zero. (See class documentation for more information.)public double[][] prCurve(boolean interpolate)
Warning: Despite the name, the values returned are in the arrays with recall at index 0 and precision at index 1.
interpolate
- Set to true
for precision
interpolation.public double[][] prScoreCurve(boolean interpolate)
prCurve(boolean)
.
Index 0 is recall, 1 is precision and 2 is the score.
.
interpolate
- Set to true
if the precisions
are interpolated through pruning dominated points.public double[][] rocCurve(boolean interpolate)
true
. See the class documentation above for
a definition and example of the returned curve.interpolate
- Interpolate specificity values.public double maximumFMeasure()
public double maximumFMeasure(double beta)
β > 0
.public double precisionAt(int rank)
Double.NaN
is
returned.public double prBreakevenPoint()
public double reciprocalRank()
1/N
of the
rank N
at which the first true positive is found.
This method counts ranks from 1 rather than 0.
The return result will be between 1.0 for the first-best result
being correct and 0.0, for none of the results being correct.public double areaUnderPrCurve(boolean interpolate)
Warning: This method uses the parallelogram method for interpolation rather than the usual interpolation method used to calculate AUC for precision-recall in information retrieval evaluations. The usual AUC calculation for PR curves
interpolate
- Set to true
to interpolate
the precision values.public double areaUnderRocCurve(boolean interpolate)
interpolate
- Set to true
to interpolate
the rejection recall values.public String toString()
public static void printPrecisionRecallCurve(double[][] prCurve, PrintWriter pw)
prCurve(boolean)
: an array of length-2 arrays of doubles.
In each length-2 array, the recall value is at index 0, and the precision
is at index 1. The printed curve prints 3 columns in the following order:
precision, recall, F-measure.prCurve
- A precision-recall curve.pw
- The output PrintWriter.public static void printScorePrecisionRecallCurve(double[][] prScoreCurve, PrintWriter pw)
prScoreCurve(boolean)
: an array of length-3 arrays of doubles.
In each length-3 array, the recall value is at index 0, and the precision
is at index 1 and score at 2. The printed curve prints 3 columns in the following order:
precision, recall, score.prScoreCurve
- A precision-recall score curve.pw
- The output PrintWriter.