com.aliasi.stats
Class ZipfDistribution

java.lang.Object
  extended by com.aliasi.stats.AbstractDiscreteDistribution
      extended by com.aliasi.stats.ZipfDistribution
All Implemented Interfaces:
DiscreteDistribution

public class ZipfDistribution
extends AbstractDiscreteDistribution

The ZipfDistribution class provides a finite distribution parameterized by a positive integer number of outcomes with outcome probability inversely proportional to the rank of the outcome (ordered by probablity). Many natural language phenomena such as unigram word probabilities and named-entity probabilities follow roughly a Zipf distribution.

The Zipf probability distribution Zipfn with n outcomes is defined by assigning a probability to the rank r outcome, for 1<=r<=n, by:

Zipfn(r) = (1/r)/Zn
where Zn is the normalizing factor for a Zipf distribution with n outcomes:
Zn = Σ1<=j<=n 1/j

The Zipf distribution class provides a method for returning the entropy of the Zipf distribution. It also provides a static method for returning a Zipf distribution's probabilities in rank order. This latter method is useful for comparing observed distributions to that expected from a Zipf distribution.

For more information, see:

Since:
LingPipe2.0
Version:
2.0
Author:
Bob Carpenter

Constructor Summary
ZipfDistribution(int numOutcomes)
          Construct a Constant Zipf distribution with the specified number of outcomes.
 
Method Summary
 long maxOutcome()
          Returns the maximum outcome, which is just the number of outcomes.
 long minOutcome()
          Returns one, the minimum outcome in a Zipf distribution.
 int numOutcomes()
          Returns the number of non-zero outcomes for this Zipf distribution.
 double probability(long rank)
          Returns the probability of the outcome at the specified rank.
static double[] zipfDistribution(int numOutcomes)
          Returns the array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes.
 
Methods inherited from class com.aliasi.stats.AbstractDiscreteDistribution
cumulativeProbability, cumulativeProbabilityGreater, cumulativeProbabilityLess, entropy, log2Probability, mean, variance
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ZipfDistribution

public ZipfDistribution(int numOutcomes)
Construct a Constant Zipf distribution with the specified number of outcomes.

Parameters:
numOutcomes - Number of outcomes for the distribution.
Throws:
IllegalArgumentException - If the number of outcomes specified is not positive.
Method Detail

minOutcome

public long minOutcome()
Returns one, the minimum outcome in a Zipf distribution.

Specified by:
minOutcome in interface DiscreteDistribution
Overrides:
minOutcome in class AbstractDiscreteDistribution
Returns:
One.

maxOutcome

public long maxOutcome()
Returns the maximum outcome, which is just the number of outcomes.

Specified by:
maxOutcome in interface DiscreteDistribution
Overrides:
maxOutcome in class AbstractDiscreteDistribution
Returns:
The maximum non-zero outcome.

numOutcomes

public int numOutcomes()
Returns the number of non-zero outcomes for this Zipf distribution.

Returns:
The number of non-zero outcomes for this distributioni.

probability

public double probability(long rank)
Returns the probability of the outcome at the specified rank. This method returns 0.0 for non-positive ranks or ranks greater than the number of ranks in this distribution.

Specified by:
probability in interface DiscreteDistribution
Specified by:
probability in class AbstractDiscreteDistribution
Parameters:
rank - Rank of outcome.
Returns:
The probability of the outcome at the specified rank.

zipfDistribution

public static double[] zipfDistribution(int numOutcomes)
Returns the array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes. See the class documentation above for a definition of these probabilities. Note that the index of the outcome will be one less than its rank; for example, the rank 1 outcome's probability is at index 0, the rank 5 outcome's probabilty at index 4.

Parameters:
numOutcomes - Number of outcomes.
Returns:
The array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes.