com.aliasi.stats
Class BinomialDistribution

java.lang.Object
  extended by com.aliasi.stats.AbstractDiscreteDistribution
      extended by com.aliasi.stats.BinomialDistribution
All Implemented Interfaces:
DiscreteDistribution

public class BinomialDistribution
extends AbstractDiscreteDistribution

A BinomialDistribution is a discrete distribution over the number of successes given a fixed number of Bernoulli trials. A binomial distribution is constructed from a specified Bernoulli distribution which determines the success probability. The minimum outcome is 0 and the maximum outcome is the number of trials. This class also defines a constant method log2BinomialCoefficient(long,long) for computing binomial coefficients.

The method z(int) returns the z-score statistic for a specified number of outcomes.

Computing P-Values

As of LingPipe 3.2.0, the dependency on Jakarta Commons Math was removed. As a result, we removed the two methods that computed p-values. Here's their implementation in case you need the functionality (you may need to increas the text size):

 import org.apache.commons.math.MathException;
 import org.apache.commons.math.distribution.NormalDistribution;
 import org.apache.commons.math.distribution.NormalDistributionImpl;

 static final NormalDistribution Z_DISTRIBUTION
       = new NormalDistributionImpl();

 /**
  * Returns the two-sided p-value computed from the z-score for
  * this distribution for the specified number of successes.
  ...
  double pValue(int numSuccesses) throws MathException {
     return pValue(bernoulliDistribution().successProbability(),
                   numSuccesses,
                   numTrials());
 }

 /**
  * Returns the one-sided p-value computed from the z-score for
  * this distribution for the specified number of successes.
  ...
  double pValueLess(int numSuccesses) throws MathException {
      return pValueLess(bernoulliDistribution().successProbability(),
                        numSuccesses,
                        mNumTrials());
  }

 /**
  * Returns the two-sided p-value for the z-score statistic on the
  * specified number of successes out of the specified number of
  * trials for the specified success probability.
  ...
  static double pValue(double successProbability,
                       int numSuccesses,
                       int numTrials) throws MathException {

      double z = z(successProbability,numSuccesses,numTrials);
      return 2.0 * Z_DISTRIBUTION.cumulativeProbability(Math.min(-z,z));
   }

  /**
   * Returns the one-sided (lower) p-value for the z-score statistic
   * on the specified number of successes out of the specified
   * number of trials for the specified success probability.
   ...
   static double pValueLess(double successProbability,
                            int numSuccesses,
                            int numTrials) throws MathException {
       double z = z(successProbability,numSuccesses,numTrials);
       return 1.0 - Z_DISTRIBUTION.cumulativeProbability(z);
   }

For more information, see:

Since:
LingPipe2.0
Version:
3.2.0
Author:
Bob Carpenter

Constructor Summary
BinomialDistribution(BernoulliDistribution distribution, int numTrials)
          Construct a binomial distribution that samples from the specified Bernoulli distribution the specified number of times.
 
Method Summary
 BernoulliDistribution bernoulliDistribution()
          Returns the underlying Bernoulli (two outcome) distribution underlying this binomial distribution.
static double log2BinomialCoefficient(long n, long m)
          Returns the log (base 2) of the binomial coefficient of the specified arguments.
 double log2Probability(long outcome)
          Returns the log (base 2) probability of the specified outcome.
 long maxOutcome()
          Returns the maximum non-zero probability outcome, which is the number of trials for this distribution.
 long minOutcome()
          Returns zero, the minimum outcome for a binomial distribution.
 long numTrials()
          Returns the number of trials for this binomial distribution.
 double probability(long outcome)
          Returns the probability of the specified outcome.
 double variance()
          Returns the variance of this binomial distribution.
static double z(double successProbability, int numSuccesses, int numTrials)
          Returns the z score for the specified number of successes out of the specified number of trials given the specified success probability.
 double z(int numSuccesses)
          Returns the z-score for the specified number of successes given this distribution's success probability and number of trials.
 
Methods inherited from class com.aliasi.stats.AbstractDiscreteDistribution
cumulativeProbability, cumulativeProbabilityGreater, cumulativeProbabilityLess, entropy, mean
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BinomialDistribution

public BinomialDistribution(BernoulliDistribution distribution,
                            int numTrials)
Construct a binomial distribution that samples from the specified Bernoulli distribution the specified number of times. The resulting distribution is over the number of successes, with a range between zero and the number of trials.

The Bernoulli distribution is stored and any change to it will affect the constructed binomial distribution.

Parameters:
distribution - Underlying Bernoulli distribution.
Method Detail

bernoulliDistribution

public BernoulliDistribution bernoulliDistribution()
Returns the underlying Bernoulli (two outcome) distribution underlying this binomial distribution.

Returns:
The base distribution.

minOutcome

public long minOutcome()
Returns zero, the minimum outcome for a binomial distribution.

Specified by:
minOutcome in interface DiscreteDistribution
Overrides:
minOutcome in class AbstractDiscreteDistribution
Returns:
Zero, the minimum outcome for a binomial distribution.

maxOutcome

public long maxOutcome()
Returns the maximum non-zero probability outcome, which is the number of trials for this distribution.

Specified by:
maxOutcome in interface DiscreteDistribution
Overrides:
maxOutcome in class AbstractDiscreteDistribution
Returns:
The maximum non-zero probability outcome.

numTrials

public long numTrials()
Returns the number of trials for this binomial distribution. This is the same as the result of maxOutcome().

Returns:
The number of trials.

probability

public double probability(long outcome)
Returns the probability of the specified outcome. The probability is determined by the likelihood of the specified number of successes out of the number of trials for this distribution.

The probability for a specified number of outcomes is:

P(numSuccesses)
  = binomialCoefficient(numTrials,numSuccesses)
  * P(success)n
  * (1 - P(success))numTrials - numSuccesses
where numTrials is the number of trials for this binomial distribution and P(success) is the success probability of the Bernoulli distribution underlying this binomial distribution.

Specified by:
probability in interface DiscreteDistribution
Specified by:
probability in class AbstractDiscreteDistribution
Parameters:
outcome - Number of successes.
Returns:
Probability of specified number of successes.

log2Probability

public double log2Probability(long outcome)
Returns the log (base 2) probability of the specified outcome. The probability is determined by the likelihood of the specified number of successes out of the number of trials for this distribution. See the documentation for the method probability(long) for an exact definition.

Specified by:
log2Probability in interface DiscreteDistribution
Overrides:
log2Probability in class AbstractDiscreteDistribution
Parameters:
outcome - Number of successes.
Returns:
Probability of specified number of successes.

z

public double z(int numSuccesses)
Returns the z-score for the specified number of successes given this distribution's success probability and number of trials. Z-scores may take on any value from negative to positive infinity. A z-score is the number of standard deviations above or below the expected number of successes for this distribution. Thus the greater the absolute value of the z-score, the less likely the number of successes was drawn from this distribution. The lower a negative z-score, the more likely it was drawn from a distribution with a lower success probability and the higher a positive z-score, the more likely it was drawn from a distribution with a higher success probability.

The formula for z-scores is provided in the documentation for the static method z(double,int,int).

Parameters:
numSuccesses - Number of successes in sample.
Returns:
Z score value.
Throws:
IllegalArgumentException - If the number of successes is less than 0 or more than the number of trials for this distribution.

variance

public double variance()
Returns the variance of this binomial distribution. The variance of a binomial distribution is:
variance = numTrials * P(success) * (1 - P(success))

Specified by:
variance in interface DiscreteDistribution
Overrides:
variance in class AbstractDiscreteDistribution
Returns:
The variance of this binomial distribution.

z

public static double z(double successProbability,
                       int numSuccesses,
                       int numTrials)
Returns the z score for the specified number of successes out of the specified number of trials given the specified success probability. The z-score is the number of standard deviations above or below the median number of outcomes the given number of successes lies given the success probability and number of trials.

The z-score for binomial distributions is defined by:

z = (numSuccesses - expectedSuccesses)
  / (numTrials * P(success) * (1-P(success)))1/2
where
expectedSuccesses = P(success) * numTrials
Thus numerator is the difference between observed and expected values for the number of successes and the denominator is the standard deviation for the Bernoulli trial iterated over the specified number of trials.

Parameters:
successProbability - Probability of success.
numSuccesses - Number of successes.
numTrials - Number of trials.
Throws:
IllegalArgumentException - If the success probability is not between 0 and 1 or if the number of successes is less than zero or greater than the number of trials.

log2BinomialCoefficient

public static double log2BinomialCoefficient(long n,
                                             long m)
Returns the log (base 2) of the binomial coefficient of the specified arguments. The binomial coefficient is equal to the number of ways to choose a subset of size m from a set of n objects, which is pronounced "n choose m", and is given by:
binomialCoefficient(n,m) = n! / ( m! * (n-m)!)
log2 choose(n,m) = log2 n - log2 m - log2 (n-m)

Returns:
The log (base 2) of the binomial coefficient of the specified arguments.