com.aliasi.dca
Class DiscreteChooser

java.lang.Object
  extended by com.aliasi.dca.DiscreteChooser
All Implemented Interfaces:
Serializable

public class DiscreteChooser
extends Object
implements Serializable

The DiscreteChooser class implements multinomial conditional logit discrete choice analysis with variables varying over alternatives.

Multinomial Logit Discrete Choice Analysis

This form of discrete choice analysis considers an arbitrary positive number of choices represented as vectors and computes the probability of each choice as a vector product on the multilogit scale.

The model is characterized by a coefficient vector β, which is supplied at construction time. Choices are represented as vectors, and there may be any number of them.

Suppose the model is presented with N choices represented as vectors α[0],...,α[N-1]. The probability of choice n for 0 <= n < N is

 p(n|α,β) = exp(α[n] * β) / Z
where the asterisk (*) represents vector dot products, and the normalizing constant is defined by summation in the usual way, by
 Z = Σ0 <= n < N exp(α[n] * β)

This model is related to logistic regression in that it is a log-linear model.

No Intercepts

Constant features in the training data will be ignored statistically. This is because the sum of its gradients will always be 0. For instance, consider an update in the training data. If there are four alternatives with probabilities 0.8, 0.1, 0.05, and 0.05 and the first alternative with probability of 0.8 is correct, then the error will be -0.2 for the correct choice and (0.1 + 0.05 + 0.05) = 0.2 for all of the incorrect choices, leading to a total gradient of 0.0.

Thus intercepts should not be added to choice vectors and a regression prior with an uninformative intercept should not be used for estimation.

Independence of Irrelevant Alternatives

The ratio of probabilities between two choices does not change based on the other choices presented. In econometrics, this is known as the independence of irrelevant alternatives (IIA). It is easy to verify for discrete choosers by computing the ratio between the probability of choice n and choice m,
 p(n|α,β)/p(m|α, β)

     = (exp(α[n] * β) / Z) / (exp(α[m] * β) / Z)

     = exp(α[n] * β) / exp(α[m] * β).
The value does depends on only α[n] and α[m] (and the model's coefficient vector β).

This is fine when the choices are independent, but problematic when there are dependencies between the choices. A standard example of is given a choice between items A and B may be modeled properly. But consider a B' that is very much like B and added to the mix. For instance, consider choosing between a California cabernet and a Bordeaux. Suppose you have a 2/3 probability of choosing the Bordeaux and a 1/3 probability of choosing the California cabernet. Now consider adding a second Califoronia cabernet that's very similar to the first one (as measured by the model, of course). Then the probabilities will be roughly 1/2 for choosing the Bordeaux, and 1/4 for each of the California cabernets. With similar choices, the probability of each should go down. If they were identical (perfectly correlated), the right answer would seem to be a 2/3 probability of choosing the Bordeaux and 1/6 probability for choosing each of the Califoronia cabernets.

One or More Choosers

If the coefficients all correspond to features of the choice, the model is implicitly representing the decision function of a single chooser.

If some of the coefficients are features of a chooser, the model may be used to represent the decision function of multiple choosers. In this case, all fetaure tying is up to the implementing class. Typically, there will be interaction features included between the chooser and the choice. Returning to the wine example, different choosers might put different weights on the degree of new oak used, acid levels, or complexity. In these cases, overall preferences may be represented by chooser-independent variables, and then chooser-dependent preferences would be interpreted relatively.

References

Since:
LingPipe3.9.1
Version:
3.9.2
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
DiscreteChooser(Vector coefficients)
          Construct a discrete chooser with the specified coefficient vector.
 
Method Summary
 double[] choiceLogProbs(Vector[] choices)
          Return an array of (natural) log choice probabilities corresponding to the input array of choices.
 double[] choiceProbs(Vector[] choices)
          Return an array of choice probabilities corresponding to the input array of choices.
 int choose(Vector[] choices)
          Returns the most likely choice among the choices in the specified array of vectors.
 Vector coefficients()
          Return an unmodifiable view of the coefficients underlying this discrete chooser.
static DiscreteChooser estimate(Vector[][] alternativess, int[] choices, RegressionPrior prior, int priorBlockSize, AnnealingSchedule annealingSchedule, double minImprovement, int minEpochs, int maxEpochs, Reporter reporter)
          Returns a discrete choice model estimated from the specified training data, prior, and learning parameters.
 String toString()
          Return a string-based representation of the coefficient vector underlying this discrete chooser.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DiscreteChooser

public DiscreteChooser(Vector coefficients)
Construct a discrete chooser with the specified coefficient vector.

Parameters:
coefficients - Coefficient vector.
Method Detail

choose

public int choose(Vector[] choices)
Returns the most likely choice among the choices in the specified array of vectors.

Parameters:
choices - Array of alternative choices represented as vectors.
Returns:
The most likely choice.
Throws:
IllegalArgumentException - If there is not at least one choice.

choiceProbs

public double[] choiceProbs(Vector[] choices)
Return an array of choice probabilities corresponding to the input array of choices. The array of results will be parallel to the array of input vectors.

Parameters:
choices - Array of alternative choices represented as vectors.
Returns:
Parallel array of probabilities for each choice.
Throws:
IllegalArgumentException - If there is not at least one choice.

choiceLogProbs

public double[] choiceLogProbs(Vector[] choices)
Return an array of (natural) log choice probabilities corresponding to the input array of choices. The array of results will be parallel to the array of input vectors.

Parameters:
choices - Array of alternative choices represented as vectors.
Returns:
Parallel array of choice (natural) log probabilities.
Throws:
IllegalArgumentException - If there is not at least one choice.

coefficients

public Vector coefficients()
Return an unmodifiable view of the coefficients underlying this discrete chooser.

Returns:
The coefficient vector for this chooser.

toString

public String toString()
Return a string-based representation of the coefficient vector underlying this discrete chooser.

Overrides:
toString in class Object
Returns:
String representation of this chooser.

estimate

public static DiscreteChooser estimate(Vector[][] alternativess,
                                       int[] choices,
                                       RegressionPrior prior,
                                       int priorBlockSize,
                                       AnnealingSchedule annealingSchedule,
                                       double minImprovement,
                                       int minEpochs,
                                       int maxEpochs,
                                       Reporter reporter)
Returns a discrete choice model estimated from the specified training data, prior, and learning parameters.

Training is carried out using stochastic gradient descent. Priors are only applied every block size number of examples, and at the end of each epoch to catch up.

The reporter receives LogLevel.INFO reports on parameters and LogLevel.DEBUG reports on a per-epoch basis of learning rate, log likelihood, log prior, and totals.

Parameters:
alternativess - An array of vectors for each training instance.
choices - The index of the vector chosen for each training instance.
prior - The prior to apply to coefficients.
priorBlockSize - Period with which the prior is applied.
annealingSchedule - Learning rates per epoch.
minImprovement - Minimum improvement in the rolling average of log likelihood plus prior to compute another epoch.
minEpochs - Minimum number of epochs to compute.
maxEpochs - Maximum number of epochs.
reporter - Reporter to which progress reports are sent.