com.aliasi.stats
Class LogisticRegression

java.lang.Object
  extended by com.aliasi.stats.LogisticRegression
All Implemented Interfaces:
Compilable, Serializable

public class LogisticRegression
extends Object
implements Compilable, Serializable

A LogisticRegression instance is a multi-class vector classifier model generating conditional probability estimates of categories. This class also provides static factory methods for estimating multinomial regression models using stochastic gradient descent (SGD) to find maximum likelihood or maximum a posteriori (MAP) estimates with Laplace, Gaussian, Cauchy priors on coefficients.

The classification package contains a class LogisticRegressionClassifier which adapts this class's models and estimators to act as generic classifiers given an instance of FeatureExtractor.

Also Known As (AKA)

Multinomial logistic regression is also known as polytomous, polychotomous, or multi-class logistic regression, or just multilogit regression.

Binary logistic regression is an instance of a generalized linear model (GLM) with the logit link function. The logit function is the inverse of the logistic function, and the logistic function is sometimes called the sigmoid function or the s-curve.

Logistic regression estimation obeys the maximum entropy principle, and thus logistic regression is sometimes called "maximum entropy modeling", and the resulting classifier the "maximum entropy classifier".

The generalization of binomial logistic regression to multinomial logistic regression is sometimes called a softmax or exponential model.

Maximum a priori (MAP) estimation with Gaussian priors is often referred to as "ridge regression"; with Laplace priors MAP estimation is known as the "lasso". MAP estimation with Gaussian, Laplace or Cauchy priors is known as parameter shrinkage. Gaussian and Laplace are forms of regularized regression, with the Gaussian version being regularized with the L2 norm (Euclidean distance, called the Frobenius norm for matrices of parameters) and the Laplace version being regularized with the L1 norm (taxicab distance or Manhattan metric); other Minkowski metrics may be used for shrinkage.

Binary logistic regression is equivalent to a one-layer, single-output neural network with a logistic activation function trained under log loss. This is sometimes called classification with a single neuron.

Model Parameters

A logistic regression model is a discriminitive classifier for vectors of fixed dimensionality. The dimensions are often referred to as "features". The method numInputDimensions() returns the number of dimensions (features) in the model. Because the model is well-behaved under sparse vectors, the dimensionality may be returned as Integer.MAX_VALUE, a common default choice for sparse vectors.

A logistic regression model also fixes the number of output categories. The method numOutcomes() returns the number of categories. These outcome categories will be represented as integers from 0 to numOutcomes()-1 inclusive.

A model is parameterized by a real-valued vector for every category other than the last, each of which must be of the same dimensionality as the model's input feature dimensionality. The constructor LogisticRegression(Vector[]) takes an array of Vector objects, which may be dense or sparse, but must all be of the same dimensionality.

Likelihood

The likelihood of a given output category k < numOutcomes() given an input vector x of dimensionality numFeatures() is given by:

 p(c | x, β) = exp(βk * x)  / Z(x)   if c < numOutcomes()-1

               1 / Z(x)              if c = numOutcomes()-1
where βk * x is vector dot (or inner) product:
 βk * x = Σi < numDimensions() βk,i * xi
and where the normalizing denominator, called the partition function, is:
 Z(x) = 1 + Σk < numOutcomes()-1 exp(βk * x)

Error and Gradient

This class computes maximum a posteriori parameter values given a sequence of training pairs (x,c) and a prior, which must be an instance of RegressionPrior. The error function is just the negative log likelihood and log prior:
 Err(D,β) = -( log2 p(β|σ2) + Σ{(x,c') in D} log2 p(c'|x,β))
where p(β|σ2) is the likelihood of the parameters β in the prior, and p(c|x,β) is the probability of category c given input x and parameters β.

The maximum a posteriori estimate is such that the gradient (vector of partial derivatives of parameters) is zero. If the data is not linearly separable, a maximum likelihood solution must exist. If the data is not linearly separable and none of the data dimensions is colinear, the solution will be unique. If there is an informative Cauchy, Gaussian or Laplace prior, there will be a unique MAP solution even in the face of linear separability or colinear dimensions. Proofs of solution exists require showing the matrix of second partial derivatives of the error with respect to pairs of parameters, is positive semi-definite; if it is positive definite, the error is strictly concave and the MAP solution is unique.

The gradient for parameter vector βc for outcome c < k-1 is:

 grad(Err(D,βc))
 = ∂Err(D,β) / ∂βc
 = ∂(- log p(β|σ2)) / ∂βc
   + ∂( - Σ{(x,c') in D} log p(c' | x, β))
where the gradient of the priors are described in the class documentation for RegressionPrior, and the gradient of the likelihood function is:.
 ∂(-Σ{(x,c') in D} log p(c' | x, β)) / ∂βc
 =  - Σ{(x,c') in D} ∂ log p(c' | x, β))  /∂βc
 =  - Σ{(x,c') in D} x * (p(c' | x, β) - I(c = c'))
where the indicator function I(c=c') is equal to 1 if c=c' and equal to 0 otherwise.

Intercept Term

It is conventional to assume that inputs have their first dimension reserved for the constant 1, which makes the parameters βc,0 intercepts. The priors allow the intercept to be given an uninformative prior even if the other dimensions have informative priors.

Feature Normalization

It is also common to convert inputs to z-scores in logistic regression. The z-score is computed given the mean and deviation of each dimension. The problem with centering (subtracting the mean from each value) is that it destroys sparsity. We recommend not centering and using an intercept term with an uninformative prior.

Variance normalization can be achieved by setting the variance prior parameter independently for each dimension.

Non-Linear and Interaction Features

It is common in logistic regression to include derived features which represent non-linear combinations of other input features. Typically, this is done through multiplication. For instance, if the output is a quadratic function of an input dimension i, then in addition to the raw value xi, anotehr feature j may be introduced with value xi2.

Similarly, interaction terms are often added for features xi and xj, with a new feature xk being defined with value xi xj.

The resulting model is linear in the derived features, but will no longer be linear in the original features.

Stochastic Gradient Descent

This class estimates logistic regression models using stochastic gradient descent (SGD). The SGD method runs through the data one or more times, considering one training case at a time, adjusting the parameters along some multiple of the contribution to the gradient of the error for that case.

With informative priors, the search space is strictly concave, and there will be a unique solution. In cases of linear dependence between dimensions or in separable data, maximum likelihood estimation may diverge.

The basic algorithm is:

 β = 0;
 for (epoch = 0; epoch < maxEpochs; ++epoch)
     for training case (x,c') in D
         for category c < numOutcomes-1
             βc -= learningRate(epoch) * grad(Err(x,c,c',β,σ2))
     if (epoch > minEpochs && converged)
         return β
where we discuss the learning rate and convergence conditions in the next section. The gradient of the error is described above, and the gradient contribution of the prior and its parameters σ are described in the class documentation for RegressionPrior. Note that the error gradient must be divided by the number of training cases to get the incremental contribution of the prior gradient. The actual algorithm uses a lazy form of updating the contribution of the gradient of the prior. The result is an algorithm that handles sparse input data touching only the non-zero dimensions of inputs during parameter updates.

Learning Parameters

In addition to the model parameters (including priors) and training data (input vectors and reference categories), the regression estimation method also requires four parameters that control search. The simplest search parameters are the minimum and maximum epoch parameters, which control the number of epochs used for optimzation.

The argument minImprovement determines how much improvement in training data and model log likelihood under the current model is necessary to go onto the next epoch. This is measured relatively, with the algorithm stopping when the current epoch's error err is relatively close to the previous epoch's error, errLast:

 abs(err - errLast)/(abs(err) + abs(errLast)) < minImprovement
Setting this to a low value will lead to slow, but accurate coefficient estimates.

Finally, the search parameters include an instance of AnnealingSchedule which impelements the learningRate(epoch) method. See that method for concrete implementations, including a standard inverse epoch annealing and exponential decay annealing.

Serialization and Compilation

For convenience, this class implements both the Serializable and Compilable interfaces. Serializing or compiling a logistic regression model has the same effect. The model read back in from its serialized state will be an instance of this class, LogisticRegression.

References

Logistic regression is discussed in most machine learning and statistics textbooks. These three machine learning textbooks all introduce some form of stochastic gradient descent and logistic regression (often not together, and often under different names as listed in the AKA section above): An introduction to traditional statistical modeling with logistic regression may be found in: A discussion of text classification using regression that evaluates with respect to support vector machines (SVMs) and considers informative Laplace and Gaussian priors varying by dimension (which this class supports), see:

Since:
LingPipe3.5
Version:
3.9.2
Author:
Bob Carpenter, Mike Ross
See Also:
Serialized Form

Constructor Summary
LogisticRegression(Vector weightVector)
          Construct a binomial logistic regression model with the specified parameter vector.
LogisticRegression(Vector[] weightVectors)
          Construct a multinomial logistic regression model with the specified weight vectors.
 
Method Summary
 double[] classify(Vector x)
          Returns an array of conditional probabilities indexed by outcomes for the specified input vector.
 void classify(Vector x, double[] ysHat)
          Fills the specified array with the conditional probabilities indexed by outcomes for the specified input vector.
 void compileTo(ObjectOutput out)
          Compiles this model to the specified object output.
static LogisticRegression estimate(Vector[] xs, int[] cs, RegressionPrior prior, AnnealingSchedule annealingSchedule, double minImprovement, int minEpochs, int maxEpochs, PrintWriter progressWriter)
          Deprecated. Use estimate(Vector[],int[],RegressionPrior,AnnealingSchedule,Reporter,double,int,int) instead.
static LogisticRegression estimate(Vector[] xs, int[] cs, RegressionPrior prior, AnnealingSchedule annealingSchedule, Reporter reporter, double minImprovement, int minEpochs, int maxEpochs)
          Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter.
static LogisticRegression estimate(Vector[] xs, int[] cs, RegressionPrior prior, int priorBlockSize, LogisticRegression hotStart, AnnealingSchedule annealingSchedule, double minImprovement, int rollingAverageSize, int minEpochs, int maxEpochs, ObjectHandler<LogisticRegression> handler, Reporter reporter)
          Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter.
static double log2Likelihood(Vector[] inputs, int[] cats, LogisticRegression regression)
          Returns the log (base 2) likelihood of the specified inputs with the specified categories using the specified regression model.
 int numInputDimensions()
          Returns the dimensionality of inputs for this logistic regression model.
 int numOutcomes()
          Returns the number of outcomes for this logistic regression model.
 Vector[] weightVectors()
          Returns an array of views of the weight vectors used for this regression model.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LogisticRegression

public LogisticRegression(Vector[] weightVectors)
Construct a multinomial logistic regression model with the specified weight vectors. With k-1 weight vectors, the result is a multinomial classifier with k outcomes.

The weight vectors are stored rather than copied, so changes to them will affect this class.

See the class definition above for more information on logistic regression.

Parameters:
weightVectors - Weight vectors definining this regression model.
Throws:
IllegalArgumentException - If the array of weight vectors does not have at least one element or if there are two weight vectors with different numbers of dimensions.

LogisticRegression

public LogisticRegression(Vector weightVector)
Construct a binomial logistic regression model with the specified parameter vector. See the class definition above for more information on logistic regression.

The weight vector is stored rather than copied, so changes to it will affect this class.

Parameters:
weightVector - The weights of features defining this model.
Method Detail

numInputDimensions

public int numInputDimensions()
Returns the dimensionality of inputs for this logistic regression model.

Returns:
The number of dimensions for this model.

numOutcomes

public int numOutcomes()
Returns the number of outcomes for this logistic regression model.

Returns:
The number of outcomes for this model.

weightVectors

public Vector[] weightVectors()
Returns an array of views of the weight vectors used for this regression model. The returned weight vectors are immutable views of the underlying vectors used by this model, so will change if the vectors making up this model change.

Returns:
An array of views of the weight vectors for this model.

classify

public double[] classify(Vector x)
Returns an array of conditional probabilities indexed by outcomes for the specified input vector. The resulting array has a value for index i that is equal to the probability of the outcome i for the specified input. The sum of the returned values will be 1.0 (modulo arithmetic precision).

See the class definition above for more information on how the conditional probabilities are computed.

Parameters:
x - The input vector.
Returns:
The array of conditional probabilities of outcomes.
Throws:
IllegalArgumentException - If the specified vector is not the same dimensionality as this logistic regression instance.

classify

public void classify(Vector x,
                     double[] ysHat)
Fills the specified array with the conditional probabilities indexed by outcomes for the specified input vector.

The resulting array has a value for index i that is equal to the probability of the outcome i for the specified input. The sum of the returned values will be 1.0 (modulo arithmetic precision).

See the class definition above for more information on how the conditional probabilities are computed.

Parameters:
x - The input vector.
ysHat - Array into which conditional probabilities are written.
Throws:
IllegalArgumentException - If the specified vector is not the same dimensionality as this logistic regression instance.

compileTo

public void compileTo(ObjectOutput out)
               throws IOException
Compiles this model to the specified object output. The compiled model, when read back in, will remain an instance of this class, LogisticRegression.

Compilation does the same thing as serialization.

Specified by:
compileTo in interface Compilable
Parameters:
out - Object output to which this model is compiled.
Throws:
IOException - If there is an underlying I/O error during serialization.

estimate

@Deprecated
public static LogisticRegression estimate(Vector[] xs,
                                                     int[] cs,
                                                     RegressionPrior prior,
                                                     AnnealingSchedule annealingSchedule,
                                                     double minImprovement,
                                                     int minEpochs,
                                                     int maxEpochs,
                                                     PrintWriter progressWriter)
Deprecated. Use estimate(Vector[],int[],RegressionPrior,AnnealingSchedule,Reporter,double,int,int) instead.

Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch and the minimum and maximum number of estimation epochs.

See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method.

This method just calls estimate(Vector[],int[],RegressionPrior, AnnealingSchedule,Reporter,double,int,int) with a new reporter created from the supplied progress writer. The reporter will be silent if the progress writer is null, and will wrap the progress writer and report at LogLevel.DEBUG.

Parameters:
xs - Input vectors indexed by training case.
cs - Output categories indexed by training case.
prior - The prior to be used for regression.
annealingSchedule - Class to compute learning rate for each epoch.
minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch.
minEpochs - Minimum number of epochs.
maxEpochs - Maximum number of epochs.
progressWriter - Writer to which progress reports are written, or null if no progress reports are needed.
Throws:
IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances.

estimate

public static LogisticRegression estimate(Vector[] xs,
                                          int[] cs,
                                          RegressionPrior prior,
                                          AnnealingSchedule annealingSchedule,
                                          Reporter reporter,
                                          double minImprovement,
                                          int minEpochs,
                                          int maxEpochs)
Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. The prior block size defaults to the number of examples divided by 50.

See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method.

Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions.

Parameters:
xs - Input vectors indexed by training case.
cs - Output categories indexed by training case.
prior - The prior to be used for regression.
annealingSchedule - Class to compute learning rate for each epoch.
minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch.
minEpochs - Minimum number of epochs.
maxEpochs - Maximum number of epochs.
reporter - Reporter to which progress reports are written, or null if no progress reports are needed.
Throws:
IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances.

estimate

public static LogisticRegression estimate(Vector[] xs,
                                          int[] cs,
                                          RegressionPrior prior,
                                          int priorBlockSize,
                                          LogisticRegression hotStart,
                                          AnnealingSchedule annealingSchedule,
                                          double minImprovement,
                                          int rollingAverageSize,
                                          int minEpochs,
                                          int maxEpochs,
                                          ObjectHandler<LogisticRegression> handler,
                                          Reporter reporter)
Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter.

See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method.

Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions.

Parameters:
xs - Input vectors indexed by training case.
cs - Output categories indexed by training case.
prior - The prior to be used for regression.
priorBlockSize - Number of examples whose gradient is updated before the prior gradient is updated.
hotStart - Logistic regression from which to retrieve initial weights or null to use zero vectors.
annealingSchedule - Class to compute learning rate for each epoch.
minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch.
minEpochs - Minimum number of epochs.
maxEpochs - Maximum number of epochs.
handler - Handler for intermediate regression results.
reporter - Reporter to which progress reports are written, or null if no progress reports are needed.
Throws:
IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances.

log2Likelihood

public static double log2Likelihood(Vector[] inputs,
                                    int[] cats,
                                    LogisticRegression regression)
Returns the log (base 2) likelihood of the specified inputs with the specified categories using the specified regression model.

Parameters:
inputs - Input vectors.
cats - Categories for input vectors.
regression - Model to use for computing likelihood.
Throws:
IllegalArgumentException - If the inputs and categories are not the same length.