com.aliasi.classify
Class JointClassification

java.lang.Object
  extended by com.aliasi.classify.Classification
      extended by com.aliasi.classify.RankedClassification
          extended by com.aliasi.classify.ScoredClassification
              extended by com.aliasi.classify.ConditionalClassification
                  extended by com.aliasi.classify.JointClassification

public class JointClassification
extends ConditionalClassification

A JointClassification is a conditional classification derived from a joint probability assignment to each category and the object being classified. The conditional probabilities are computed from the joint probabilities, but an additional score may be provided for ordering. These scores must be ordered in the same way as the joint probabilities. For example, the language model classifiers implement the score as an entropy rate to allow between-document comparisons.

In addition to the score and conditional probability methods, this interface adds a method to retrieve joint log (base 2) probability by rank, jointLog2Probability(int).

The conditional probability estimate of the category given the input is derived from the joint probability of category and input:

P(category|input) = P(category,input) / P(input)
where the joint probability P(category,input) is determined by the joint probability estimate and the input probability P(input) is estimated by marginalization:
P(input) = Σcategory P(category,input)

Warning: The result of marginalization is the same as that of Statistics.normalize(double[]) applied to the joint probabilities. The same warning carries over here: if the largest joint probability is more than 252 times larger than the next largest, the largest will round off to one and all others will round off to zero due to underflow.

Since:
LingPipe2.0
Version:
3.8
Author:
Bob Carpenter

Constructor Summary
JointClassification(String[] categories, double[] log2JointProbs)
          Construct a joint classification with the specified parallel arrays of categories and log (base 2) joint probabilities of category and input object.
JointClassification(String[] categories, double[] scores, double[] log2JointProbs)
          Construct a joint classification with the specified parallel arrays of categories and log (base 2) joint probabilities of category and input object.
 
Method Summary
static JointClassification create(String[] categories, double[] logProbabilities)
          Return a joint classification given the categories and log probabilities.
 double jointLog2Probability(int rank)
          Returns the log (base 2) probability of the category at the specified rank.
 double score(int rank)
          Returns the cross-entropy rate of the category and text at the specified rank.
 String toString()
          Returns a string-based representation of this joint probability ranked classification.
 
Methods inherited from class com.aliasi.classify.ConditionalClassification
conditionalProbability, conditionalProbability, createLogProbs, createProbs
 
Methods inherited from class com.aliasi.classify.ScoredClassification
create
 
Methods inherited from class com.aliasi.classify.RankedClassification
category, size
 
Methods inherited from class com.aliasi.classify.Classification
bestCategory
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

JointClassification

public JointClassification(String[] categories,
                           double[] log2JointProbs)
Construct a joint classification with the specified parallel arrays of categories and log (base 2) joint probabilities of category and input object. The scores are taken to be the log joint probabilities. Joint log probabilities must be in descending numerical order and all log probabilities all must be zero or negative. If a probability is zero, the corresponding log probability should be Double.NEGATIVE_INFINITY, which is a legal input to this constructor.

Parameters:
categories - Array of categories.
log2JointProbs - Log (base 2) joint probabilities of categories, in descending numerical order.
Throws:
IllegalArgumentException - If any of the log joint probabilities is not zero or negative, or if they are not in descending order.

JointClassification

public JointClassification(String[] categories,
                           double[] scores,
                           double[] log2JointProbs)
Construct a joint classification with the specified parallel arrays of categories and log (base 2) joint probabilities of category and input object. The scores and joint probabilities must be in descending numerical order. Log probabilities all must be zero or negative. If a probability is zero, the corresponding log probability should be Double.NEGATIVE_INFINITY, which is a legal input to this constructor.

Parameters:
categories - Array of categories.
scores - Scores of categories, in descending numerical order.
log2JointProbs - Log (base 2) joint probabilities of categories, in descending numerical order.
Throws:
IllegalArgumentException - If any of the log joint probabilities is not zero or negative, or if they are not in descending order.
Method Detail

jointLog2Probability

public double jointLog2Probability(int rank)
Returns the log (base 2) probability of the category at the specified rank. Note that this is the same value as is returned by score(int).

Parameters:
rank - Rank of result.
Returns:
Log (base 2) estimate of the joint probability of the category of the specified rank and the input object.

score

public double score(int rank)
Returns the cross-entropy rate of the category and text at the specified rank. As with all ranked classifications, scores are in non-ascending order.

The cross-entropy rate of the category and text is defined differently than the cross-entropy of the text. For the combination, we divide the log (base 2) probability of the text plus the log (base 2) probability of the category by the length of the text plus 1. This non-standard definition ensures that the cross-entropy ordering remains the same as the joint probability ordering.

Overrides:
score in class ScoredClassification
Parameters:
rank - Rank of result category.
Returns:
The cross-entropy rate of the category at the specified rank.

toString

public String toString()
Returns a string-based representation of this joint probability ranked classification.

Overrides:
toString in class ConditionalClassification
Returns:
A string-based representation of this classification.

create

public static JointClassification create(String[] categories,
                                         double[] logProbabilities)
Return a joint classification given the categories and log probabilities.

The log probabilities must be finite and non-positive. A collection of joint probabilities should not exceed 1.0, but there is no such check; the result is just normalized.

Parameters:
categories - Array of categories.
logProbabilities - Parallel array of log probabilities.
Returns:
Joint classification corresponding to categories and probabilities.
Throws:
IllegalArgumentException - If any of the log probabilities is infinite, not a number, or positive, or if the arrays are not of the same length.