com.aliasi.spell
Class SpellEvaluator

java.lang.Object
  extended by com.aliasi.spell.SpellEvaluator

public class SpellEvaluator
extends Object

The SpellEvaluator provides an evaluation harness for spell checkers. As with the other evaluator classes, it is constructed with the spell checker that will be evaluated. Test cases are presented to the evaluator using the addCase(String,String) method. The getLastCaseReport() method returns a string-based representation of the performance of the most recently provided test case. The method toString() provides a general report of results.

The method normalize(String) may be used to normalize both input text and system outputs before comparing them. This may be used to do an evaluation that is case or space or punctuation insensitive, for example.

The basic output of the spell checker evaluation classifies test cases into one of five categories:

User InputSystem SuggestionStatusMethod
CorrectNo SuggestionTN userCorrectSystemNoSuggestion()
CorrectWrong SuggestionFP userCorrectSystemWrongSuggestion()
ErrorCorrect SuggestionTP userErrorSystemCorrect()
ErrorNo SuggestionFN userErrorSystemNoSuggestion()
ErrorWrong SuggestionFN,FP userErrorSystemWrongSuggestion()

The status indicates whether the case counts as a true positive (TP), false positive (FP), true negative (TN), or false negative (FN). Note that if the user's input contains an error and the system provides the wrong suggestion, the result counts as both a false negative (failure to correct) and a false positive (erroneous correction). Because of the case of user input error and wrong system correction, the confusion matrix count is not quite one-to-one in size with the input size. A confusion matrix may be retrieved (populated with the above counts) through the method confusionMatrix().

The methods for extracting the cases are listed in the final column for each of the five result types.

Since:
LingPipe2.4.1
Version:
3.8
Author:
Breck Baldwin, Bob Carpenter

Constructor Summary
SpellEvaluator(SpellChecker checker)
          Construct a spelling evaluator for the specified spell checker.
SpellEvaluator(SpellChecker checker, ObjectToCounterMap<String> tokenCounter)
          Construct a spelling evaluator for the specified spell checker with the specified token counts.
 
Method Summary
 void addCase(String text, String correctText)
          Adds a training case to the spelling evaluator in the form of input text and its corrected form.
 ConfusionMatrix confusionMatrix()
          Returns the confusion matrix for the current state of this evaluation.
 String getLastCaseReport()
          Returns a string-based representation of the last test case.
 String normalize(String text)
          Return the normalized form of a query or system output.
 String toString()
          Return a string-based representation of the current status of this evaluation.
 String[][] userCorrectSystemNoSuggestion()
          Returns an array of cases for which the user was correct and the system made no suggestions.
 String[][] userCorrectSystemWrongSuggestion()
          Returns an array of cases for which the user was correct and the system made an erroneous suggestion.
 String[][] userErrorSystemCorrect()
          Returns an array of cases for which the user made an error and system returned the appropriate correction.
 String[][] userErrorSystemNoSuggestion()
          Returns an array of cases for which the user made an error and the systme made no suggestion.
 String[][] userErrorSystemWrongSuggestion()
          Returns an array of cases for which the user made an error and system returned the appropriate correction.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SpellEvaluator

public SpellEvaluator(SpellChecker checker)
Construct a spelling evaluator for the specified spell checker.

Parameters:
checker - Spell checker to evaluate.

SpellEvaluator

public SpellEvaluator(SpellChecker checker,
                      ObjectToCounterMap<String> tokenCounter)
Construct a spelling evaluator for the specified spell checker with the specified token counts. The token counts will be used to report counts of tokens in the corpus along with per-line outputs. If the token counter is null, no token reports are provided. In order for the token counts to be used, the spell checker must be an instance of CompiledSpellChecker.

Parameters:
checker - Spell checker to evaluate.
tokenCounter - Counter for tokens in the speller.
Method Detail

addCase

public void addCase(String text,
                    String correctText)
Adds a training case to the spelling evaluator in the form of input text and its corrected form.

Parameters:
text - Text to spell check.
correctText - Correct form of input text.

toString

public String toString()
Return a string-based representation of the current status of this evaluation.

Overrides:
toString in class Object
Returns:
String-based representation of the evaluation.

userCorrectSystemNoSuggestion

public String[][] userCorrectSystemNoSuggestion()
Returns an array of cases for which the user was correct and the system made no suggestions. The entries in the array are of the form {text,correct,suggestion}.

Returns:
The user correct, system no suggestion cases.

userCorrectSystemWrongSuggestion

public String[][] userCorrectSystemWrongSuggestion()
Returns an array of cases for which the user was correct and the system made an erroneous suggestion. The entries in the array are of the form {text,correct,suggestion}.

Returns:
The user correct, system wrong suggestion cases.

userErrorSystemCorrect

public String[][] userErrorSystemCorrect()
Returns an array of cases for which the user made an error and system returned the appropriate correction. The entries in the array are of the form {text,correct,suggestion}.

Returns:
The user error, system correct cases.

userErrorSystemWrongSuggestion

public String[][] userErrorSystemWrongSuggestion()
Returns an array of cases for which the user made an error and system returned the appropriate correction. The entries in the array are of the form {text,correct,suggestion}.

Returns:
The user error, system correct cases.

userErrorSystemNoSuggestion

public String[][] userErrorSystemNoSuggestion()
Returns an array of cases for which the user made an error and the systme made no suggestion. The entries in the array are of the form {text,correct,suggestion}.

Returns:
The user error, system no suggestion cases.

getLastCaseReport

public String getLastCaseReport()
Returns a string-based representation of the last test case.

Returns:
A string-based representation of the last test case.

confusionMatrix

public ConfusionMatrix confusionMatrix()
Returns the confusion matrix for the current state of this evaluation. The class documentation (see above) describes the calculation of true positives, false positives, false negatives, and true negatives. The categories used are "correct" and "misspelled".

The confusion matrix does not track this evaluator, so once a confusion matrix is constructed and returned, it will not reflect additional cases added to this evaluator.

Returns:
The confusion matrix for the current state of this evaluation.

normalize

public String normalize(String text)
Return the normalized form of a query or system output. This method will be applied to the input text before sending it to the spell checker and will be applied to the system suggestion before comparing it to the correct text. All cases are saved in their normalized forms.

The default implementation in this class does nothing, simply returning the input text. Subclasses may override this normalizer.

Parameters:
text - Text to normalize.
Returns:
The normalized form of the text.