com.aliasi.stats

## Class ZipfDistribution

• All Implemented Interfaces:
DiscreteDistribution

```public class ZipfDistribution
extends AbstractDiscreteDistribution```
The `ZipfDistribution` class provides a finite distribution parameterized by a positive integer number of outcomes with outcome probability inversely proportional to the rank of the outcome (ordered by probablity). Many natural language phenomena such as unigram word probabilities and named-entity probabilities follow roughly a Zipf distribution.

The Zipf probability distribution `Zipfn` with `n` outcomes is defined by assigning a probability to the rank `r` outcome, for `1<=r<=n`, by:

``` Zipfn(r) = (1/r)/Zn ```
where `Zn` is the normalizing factor for a Zipf distribution with `n` outcomes:
``` Zn = Σ1<=j<=n 1/j ```

The Zipf distribution class provides a method for returning the entropy of the Zipf distribution. It also provides a static method for returning a Zipf distribution's probabilities in rank order. This latter method is useful for comparing observed distributions to that expected from a Zipf distribution.

• Eric W. Weisstein. Zipf's Law. From MathWorld--A Wolfram Web Resource.
• Eric W. Weisstein. Statistical Rank. From MathWorld--A Wolfram Web Resource.
Since:
LingPipe2.0
Version:
2.0
Author:
Bob Carpenter
• ### Constructor Summary

Constructors
Constructor and Description
`ZipfDistribution(int numOutcomes)`
Construct a Constant Zipf distribution with the specified number of outcomes.
• ### Method Summary

All Methods
Modifier and Type Method and Description
`long` `maxOutcome()`
Returns the maximum outcome, which is just the number of outcomes.
`long` `minOutcome()`
Returns one, the minimum outcome in a Zipf distribution.
`int` `numOutcomes()`
Returns the number of non-zero outcomes for this Zipf distribution.
`double` `probability(long rank)`
Returns the probability of the outcome at the specified rank.
`static double[]` `zipfDistribution(int numOutcomes)`
Returns the array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes.
• ### Methods inherited from class com.aliasi.stats.AbstractDiscreteDistribution

`cumulativeProbability, cumulativeProbabilityGreater, cumulativeProbabilityLess, entropy, log2Probability, mean, variance`
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Constructor Detail

• #### ZipfDistribution

`public ZipfDistribution(int numOutcomes)`
Construct a Constant Zipf distribution with the specified number of outcomes.
Parameters:
`numOutcomes` - Number of outcomes for the distribution.
Throws:
`IllegalArgumentException` - If the number of outcomes specified is not positive.
• ### Method Detail

• #### minOutcome

`public long minOutcome()`
Returns one, the minimum outcome in a Zipf distribution.
Specified by:
`minOutcome` in interface `DiscreteDistribution`
Overrides:
`minOutcome` in class `AbstractDiscreteDistribution`
Returns:
One.
• #### maxOutcome

`public long maxOutcome()`
Returns the maximum outcome, which is just the number of outcomes.
Specified by:
`maxOutcome` in interface `DiscreteDistribution`
Overrides:
`maxOutcome` in class `AbstractDiscreteDistribution`
Returns:
The maximum non-zero outcome.
• #### numOutcomes

`public int numOutcomes()`
Returns the number of non-zero outcomes for this Zipf distribution.
Returns:
The number of non-zero outcomes for this distributioni.
• #### probability

`public double probability(long rank)`
Returns the probability of the outcome at the specified rank. This method returns `0.0` for non-positive ranks or ranks greater than the number of ranks in this distribution.
Specified by:
`probability` in interface `DiscreteDistribution`
Specified by:
`probability` in class `AbstractDiscreteDistribution`
Parameters:
`rank` - Rank of outcome.
Returns:
The probability of the outcome at the specified rank.
• #### zipfDistribution

`public static double[] zipfDistribution(int numOutcomes)`
Returns the array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes. See the class documentation above for a definition of these probabilities. Note that the index of the outcome will be one less than its rank; for example, the rank 1 outcome's probability is at index 0, the rank 5 outcome's probabilty at index 4.
Parameters:
`numOutcomes` - Number of outcomes.
Returns:
The array of probabilities indexed by rank for the Zipf distribution with the specified number of outcomes.