com.aliasi.stats
Class MultinomialDistribution

java.lang.Object
  extended by com.aliasi.stats.MultinomialDistribution

public class MultinomialDistribution
extends Object

A MultinomialDistribution results from drawing a fixed number of samples from a multivariate distribution. Thus the probability distribution log2Probability(int[]) is over an array of counts for the dimensions of the underlying multivariate distribution. This class also contains a static method log2MultinomialCoefficient(int[])to compute multinomial coefficients.

The method chiSquared(int[]) returns the chi-squared statistic for a sample of outcome counts represented by an array of integers. The number of degrees of freedom is one less than the number of dimensions.

Computing P-Values

As of LingPipe 3.2.0, the dependency on Jakarta Commons Math was removed. As a result, we removed the two methods that computed p-values. Here's their implementation in case you need the functionality (you may need to increas the text size):

 import org.apache.commons.math.MathException;
 import org.apache.commons.math.distribution.ChiSquaredDistribution;
 import org.apache.commons.math.distribution.ChiSquaredDistributionImpl;


   /**
    * Returns the p-value for the chi-squared statistic on the specified
    * sample counts.
   ...
   double pValue(int[] sampleCounts) throws MathException {
       ChiSquaredDistribution chiSq
           = new ChiSquaredDistributionImpl(numDimensions()-1);
       double c = chiSquared(sampleCounts);
       return chiSq.cumulativeProbability(c);
   }
 

For more information, see:

Since:
LingPipe2.0
Version:
3.2.0
Author:
Bob Carpenter

Constructor Summary
MultinomialDistribution(MultivariateDistribution distribution)
          Construct a multinomial distribution based on the specified multivariate distribution.
 
Method Summary
 MultivariateDistribution basisDistribution()
          Returns the multivariate distribution that forms the basis of this multinomial distribution.
 double chiSquared(int[] sampleCounts)
          Returns the chi-squared statistic for rejecting the null hypothesis that the specified samples were generated by this distribution.
static double log2MultinomialCoefficient(int[] sampleCounts)
          Returns the log (base 2) multinomial coefficient for the specified counts.
 double log2Probability(int[] sampleCounts)
          Returns the log (base 2) probability of the distribution of outcomes specified in the argument.
 int numDimensions()
          Returns the number of dimensions in this multinomial.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultinomialDistribution

public MultinomialDistribution(MultivariateDistribution distribution)
Construct a multinomial distribution based on the specified multivariate distribution. Note that the multivariate distribution is simply stored in this class and changes to it will result in changes to the multinomial distribution.

Parameters:
distribution - Underlying multivariate distribution defining the constructed multinomial.
Method Detail

log2Probability

public double log2Probability(int[] sampleCounts)
Returns the log (base 2) probability of the distribution of outcomes specified in the argument. The argument values represent the number of outcomes and must be 0 or more. All outcomes with values of more than zero must have a non-zero probability. Note that the probability returned is normalized for all sets of the same number of samples.

The definition of the probability value for multinomials is:

P(sampleCounts)
  = multinomialCoefficient(sampleCounts)
  * Πi P(i)sampleCounts[i]
where the multinomial coefficient is as defined in the method documentation for log2MultinomialCoefficient(int[]). Taking logarithms yields:
log2 P(sampleCounts)
  = log2 multinomialCoefficient(sampleCounts)
  + Σi sampleCounts[i] * log2 P(i)
Note that if the multivariate probability is zero for an outcome with a non-zero count, the result will be Double.NEGATIVE_INFINITY.

Parameters:
sampleCounts - Array of counts for outcomes.
Returns:
The log (base 2) probability of the specified outcome counts.
Throws:
IllegalArgumentException - If the number of outcome counts is not the same as the number of dimensions of this multinomial.

chiSquared

public double chiSquared(int[] sampleCounts)
Returns the chi-squared statistic for rejecting the null hypothesis that the specified samples were generated by this distribution. The number of degrees of freedom is the number of outcomes minus one. The lower the return value, the more likely the sample was derived from this distribution.

The definition for the chi-square value is the sum of square differences between sample counts and expected counts, normalized by expected count:

χ2(sampleCounts) = Σi (sampleCounts[i] - expectedCount(i))2 / expectedCount(i)
where the expected counts are computed based on the underlying multivariate distribution and the total sample count:
expectedCount(i) = probability(i) * totalCount
where totalCount is the sum of all of the sample counts.

Note that the chi-squared test is a large sample test. For accurate results, each expected count should be at least five; in symbols, expectedCount(i) >= 5 for all i.

Parameters:
sampleCounts - Array of sample counts.
Returns:
The chi-square estimate of the confidence that the specified samples were generated by this distribution.
Throws:
IllegalArgumentException - If the number of outcome counts is not the same as the number of dimensions of this multinomial.

numDimensions

public int numDimensions()
Returns the number of dimensions in this multinomial. This is equal to the number of dimensions of the underlying multivariate distribution.

Returns:
The number of dimensions of this multinomial distribution.

basisDistribution

public MultivariateDistribution basisDistribution()
Returns the multivariate distribution that forms the basis of this multinomial distribution. Note that changes to the basis distribution affect this multinomial distribution.

Returns:
The basis distribution underlying this multinomial.

log2MultinomialCoefficient

public static double log2MultinomialCoefficient(int[] sampleCounts)
Returns the log (base 2) multinomial coefficient for the specified counts. The multinomial coefficient counts the number of ways the set of outcomes represented by the array of individual outcome counts can be linearly ordered. The result is:
multinomialCoefficient(sampleCounts)
  = totalCount! / ( Πi sampleCounts[i]! )
Taking logarithms produces:
log2 multinomialCoefficient(sampleCounts)
  = log2 totalCount! - Σi log2 sampleCounts[i]!
The multinomial coefficient is often written using a notation similar to that used for the factorial as (sampleCounts[0],...,sampleCounts[n-1])!.

Parameters:
sampleCounts - Array of outcome counts.
Returns:
Number of ways outcomes can be linearly ordered.