com.aliasi.tag
Class TaggerEvaluator<E>

java.lang.Object
  extended by com.aliasi.tag.TaggerEvaluator<E>
Type Parameters:
E - Type of tokens in the tagging.
All Implemented Interfaces:
Handler, ObjectHandler<Tagging<E>>

public class TaggerEvaluator<E>
extends Object
implements ObjectHandler<Tagging<E>>

A TaggerEvaluator provides evaluation for first-best taggers implementing the Tagger interface.

The basis of evaluation is a gold-standard set of reference taggings. The evaluation is of taggings produced by a system (or other means) known as response taggings. Cases consisting of of reference and response taggings may be added to the evaluation using addCase(Tagging,Tagging).

The evaluator takes a tagger as an argument in its constructor. If the tagger is not null, the ObjectHandler method handle(Tagging) may be used to supply reference taggings for which a response will be created using the tagger and then added as a test case. The tagger may be reset using setTagger(Tagger), which is useful for producing a single evaluation of different taggers, such as for cross-validation.

The constructor also takes an argument determining whether inputs should be stored or not. If they are stored, then all input tokens will be available in the token-level evaluation. Tokens must be stored in order to compute unknown token accuracy.

The overall case-level accuracy, measuring how many inputs received a completely correct set of tags, is returned by caseAccuracy().

The method inputSizeHistogram() returns a map from integers to the number of reference taggings with that many input tokens.

The lastCaseToString(Set) may be used to return a string-based representation of the last case added. This method requires a set of the known tokens, or null if known tokens are not being tracked.

The primary results at the token level are returned as a classifier evaluator by tokenEvaluation(). The cases here are individual tokens. For instance, if there were 100 cases used for training of 15 tokens each, the classifier evaluator will consider 15*100 = 1500 cases, one for each token. If the inputs are stored, they will be passed on to this classifier evaluator and available through the evaluator's methods.

Accuracy for tokens not in a specified set, typically the tokens used in training, are available through unknownTokenEvaluation(Set).

Thread Safety

This class is not thread safe, and access to it must be synchronized using read/write locks. The methods to add cases, handle reference taggings, and set the tagger are write methods; all other methods are reads.

Since:
LingPipe3.9
Version:
3.9.1
Author:
Bob Carpenter

Constructor Summary
TaggerEvaluator(Tagger<E> tagger, boolean storeTokens)
          Construct a tagger evaluator using the specified tagger that stores inputs if the specified flag is true.
 
Method Summary
 void addCase(Tagging<E> referenceTagging, Tagging<E> responseTagging)
          Add a test case to this evaluator consisting of the specified reference and response taggings.
 double caseAccuracy()
          Return the accuracy at the entire case level.
 void handle(Tagging<E> referenceTagging)
          Add a case for the specified reference tagging using the contained tagger to generate a response tagging.
 ObjectToCounterMap<Integer> inputSizeHistogram()
          Returns a mapping from integers to the number of test cases with that many tokens.
 String lastCaseToString(Set<E> knownTokenSet)
          Return a string-based representation of the last case to be evaluated based on the specified known token set.
 int numCases()
          Returns the number of cases for this evaluation.
 long numTokens()
          Returns the number of tokens tested in the complete set of test cases.
 void setTagger(Tagger<E> tagger)
          Set the tagger for this evaluator to the specified value.
 boolean storeTokens()
          Returns true if this evaluator stores input tokens.
 Tagger<E> tagger()
          Return the tagger for this evaluator.
 List<String> tags()
          Return the list of tags seen so far by this tagger evaluator in either references or responses.
 BaseClassifierEvaluator<E> tokenEval()
          Returns the token-level evaluation for this tag evaluator.
 ClassifierEvaluator<E,Classification> tokenEvaluation()
          Deprecated. Use tokenEval() instead.
 BaseClassifierEvaluator<E> unknownTokenEval(Set<E> knownTokenSet)
          Return the accuracy over known token set as an instance of a classifier evaluator whose cases are individual tokens not in the specified known token set.
 ClassifierEvaluator<E,Classification> unknownTokenEvaluation(Set<E> knownTokenSet)
          Deprecated. Use unknownTokenEval(Set) instead.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TaggerEvaluator

public TaggerEvaluator(Tagger<E> tagger,
                       boolean storeTokens)
Construct a tagger evaluator using the specified tagger that stores inputs if the specified flag is true.

Parameters:
tagger - Tagger to use for generating responses, or null if cases are added manually.
storeTokens - Flag set to true if the input tokens for cases are stored.
Method Detail

tagger

public Tagger<E> tagger()
Return the tagger for this evaluator.

Returns:
The tagger for this evaluator.

setTagger

public void setTagger(Tagger<E> tagger)
Set the tagger for this evaluator to the specified value.

Parameters:
tagger - Tagger to use to generate responses.

storeTokens

public boolean storeTokens()
Returns true if this evaluator stores input tokens.

Returns:
true if this evaluator stores input tokens.

handle

public void handle(Tagging<E> referenceTagging)
Add a case for the specified reference tagging using the contained tagger to generate a response tagging.

Specified by:
handle in interface ObjectHandler<Tagging<E>>
Parameters:
referenceTagging - Reference gold-standard tagging.
Throws:
NullPointerException - If the underlying tagger is null.

addCase

public void addCase(Tagging<E> referenceTagging,
                    Tagging<E> responseTagging)
Add a test case to this evaluator consisting of the specified reference and response taggings.

Parameters:
referenceTagging - Reference gold-standard tags.
responseTagging - Response system tags.
Throws:
IllegalArgumentException - If the token lengths are not the same in the two taggings.

numCases

public int numCases()
Returns the number of cases for this evaluation. Each case consists of a reference and response tagging.

Returns:
The number of test cases.

numTokens

public long numTokens()
Returns the number of tokens tested in the complete set of test cases.

Returns:
The number of tokens evaluated.

tags

public List<String> tags()
Return the list of tags seen so far by this tagger evaluator in either references or responses.

Returns:
List of tags for this evaluator.

inputSizeHistogram

public ObjectToCounterMap<Integer> inputSizeHistogram()
Returns a mapping from integers to the number of test cases with that many tokens.

Returns:
Histogram of input sizes.

caseAccuracy

public double caseAccuracy()
Return the accuracy at the entire case level. This is the percentage of test cases where the response tags exactly matched the reference tags.

Returns:
Whole case accuracy.

unknownTokenEvaluation

@Deprecated
public ClassifierEvaluator<E,Classification> unknownTokenEvaluation(Set<E> knownTokenSet)
Deprecated. Use unknownTokenEval(Set) instead.

Return the accuracy over known token set as an instance of a classifier evaluator whose cases are individual tokens not in the specified known token set.

Parameters:
knownTokenSet - Set of known tokens to exclude from evaluation.
Returns:
Evaluation over unknown tokens.
Throws:
UnsupportedOperationException - If the inputs are not being stored.

tokenEvaluation

@Deprecated
public ClassifierEvaluator<E,Classification> tokenEvaluation()
Deprecated. Use tokenEval() instead.

Returns the token-level evaluation for this tag evaluator. If the input tokens were stored, they will be available in the returned evaluator.

Returns:
Evaluation for this tagger.

unknownTokenEval

public BaseClassifierEvaluator<E> unknownTokenEval(Set<E> knownTokenSet)
Return the accuracy over known token set as an instance of a classifier evaluator whose cases are individual tokens not in the specified known token set.

Parameters:
knownTokenSet - Set of known tokens to exclude from evaluation.
Returns:
Evaluation over unknown tokens.
Throws:
UnsupportedOperationException - If the inputs are not being stored.

tokenEval

public BaseClassifierEvaluator<E> tokenEval()
Returns the token-level evaluation for this tag evaluator. If the input tokens were stored, they will be available in the returned evaluator.

Returns:
Evaluation for this tagger.

lastCaseToString

public String lastCaseToString(Set<E> knownTokenSet)
Return a string-based representation of the last case to be evaluated based on the specified known token set. If the known token set is null, known tokens are not distinguished.

Parameters:
knownTokenSet - Set of known tokens.
Returns:
String-based representation of last case evalaution.