|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.stats.RegressionPrior
public abstract class RegressionPrior
A RegressionPrior instance represents a prior
distribution on parameters for linear or logistic regression.
Instances of this class are used as parameters in the LogisticRegression class to control the regularization or lack
thereof used by the stochastic gradient descent optimizers. The
priors all assume a zero mean (or position) for each dimension, but
allow variances (or scales) to vary by input dimension.
The behavior of a prior is determined by its gradient, the
partial derivatives with respect to the dimensions of the error
function for the prior (negative log likelihood) with respect to
a coefficient βi.
gradient(βi,i) = - ∂ log p(β) / ∂ βi
See the class documentation for LogisticRegression
for more information.
Priors also implement a log (base 2) probability density for the prior for a given parameter in a given dimension. The total log prior probability is the sum of the log probabilities for the dimensions.
Priors affect gradient descent fitting of regression through
their contribution to the gradient of the error function with
respect to the parameter vector. The contribution of the prior to
the error function is the negative log probability of the parameter
vector(s) with respect to the prior distribution. The gradient of
the error function is the collection of partial derivatives of the
error function with respect to the components of the parameter
vector. The regression prior abstract base class is defined in
terms of a single method gradient(double,int), which
specifies the value of the gradient of the error function for a
specified dimension with a specified value in that dimension.
This class implements static factory methods to construct non-informative, Gaussian and Laplace priors. The Gaussian and Laplace priors may specify a different variance for each dimension, but assumes all the prior means are zero. The priors also assume the dimensions are independent so that the full covariance matrix is assumed to be diagonal (that is, there is zero covariance between different dimensions).
Using a non-informative prior for regression results in standard maximum likelihood estimation.
The non-informative prior assumes a uniform distribution over parameter vectors:
and thus contributes nothing to the gradient:p(βi,i) = 1.0
A non-informative prior is constructed using the static methodgradient(βi,i) = 0.0
noninformative().
The Gaussian prior assumes a Gaussian (also known as normal) density over parameter vectors which results in L2-regularized regression, also known as ridge regression. Specifically, the prior allows a variance to be specified per dimension, but assumes dimensions are independent in that all off-diagonal covariances are zero.
The Gaussian density is defined by:
p(βi,i) = 1.0/sqrt(2 * π σi2) * exp(-βi2/(2 * σi2))
The Gaussian prior leads to the following contribution to the
gradient for a dimension i with parameter
betai and variance
σi2:
gradient(βi,i) = βi/(2 * σi2)
Gaussian priors are constructed using one of the static factory
methods, gaussian(double[]) or gaussian(double,boolean).
The Laplace prior assumes a Laplace density over parameter vectors which results in L1-regularized regression, also known as the lasso. The Laplace prior is called a double-exponential distribution because it is looks like an exponential distribution for positive values and the reflection of this exponential distribution around zero (or more generally, around its mean parameter).
A Laplace prior allows a variance to be specified per dimension, but like the Gaussian prior, assumes means are zero and that the dimensions are independent in that all off-diagonal covariances are zero.
The Laplace density is defined by:
p(βi,i) = (sqrt(2)/(2 * σi)) * exp(- sqrt(2) * abs(βi) / σi)
The Laplace prior leads to the following contribution to the
gradient for a dimension i with parameter
betai, mean zero and variance
σi2:
where thegradient(βi,i) = signum(βi)/(2 * σi2)
signum function is defined by Math.signum(double).
Laplace priors are constructed using one of the static factory
methods, laplace(double[]) or laplace(double,boolean).
The Cauchy prior assumes a Cauchy density (also known as a Lorentz density) over priors. The Cauchy density is a Student-t density with one degree of freedom. The Cauchy density allows a scale to be specified for each dimension. The mean and variance are undefined as their integrals diverge. The Cauchy distribution is symmetric and for regression priors, we assume a mode of zero.
The Cauchy density is defined by:
p(βi,i) = (1 / π) * (λ / (βi2 + λ2))
The Cauchy prior leads to the following contribution to the
gradient for dimension i with parameter βi and scale
λi2:
gradient(βi, i) = 2 βi / (βi2 + λi2)
Cauchy priors are constructed using one of the static factory
methods cauchy(double[]) or cauchy(double,boolean).
By convention, input dimension zero (0) may be
reserved for the intercept and set to value 1.0 in all input
vectors. For regularized regression, the regularization is
typically not applied to the intercept term. To match this
convention, the factory methods allow a boolean parameter
indicating whether the intercept parameter has a
non-informative/uniform prior. If the intercept flag indicates it
is non-informative, then dimension 0 will not have an infinite
prior variance or scale, and hence a zero gradient. The result is
that the intercept will be fit by maximum likelihood.
All of the regression priors may be serialized.
For full details on the Gaussian and Laplace distributions, see:
For explanations of how the priors are used with logistic regression, see the following two textbooks:
| Method Summary | |
|---|---|
static RegressionPrior |
cauchy(double[] priorSquaredScales)
Returns the Cauchy prior for the specified squared scales. |
static RegressionPrior |
cauchy(double priorSquaredScale,
boolean noninformativeIntercept)
Returns the Cauchy prior with the specified prior squared scales for the dimensions. |
static RegressionPrior |
gaussian(double[] priorVariances)
Returns the Gaussian prior with the specified priors for each dimension. |
static RegressionPrior |
gaussian(double priorVariance,
boolean noninformativeIntercept)
Returns the Gaussian prior with the specified prior variance and indication of whether the intercept is given a noninformative prior. |
abstract double |
gradient(double betaForDimension,
int dimension)
Returns the contribution to the gradient of the error function of the specified parameter value for the specified dimension. |
static RegressionPrior |
laplace(double[] priorVariances)
Returns the Laplace prior with the specified prior variances for the dimensions. |
static RegressionPrior |
laplace(double priorVariance,
boolean noninformativeIntercept)
Returns the Laplace prior with the specified prior variance and number of dimensions and indication of whether the intecept dimension is given a noninformative prior. |
abstract double |
log2Prior(double betaForDimension,
int dimension)
Returns the log (base 2) of the prior density evaluated at the specified coefficient value for the specified dimension. |
double |
log2Prior(Vector beta)
Returns the log (base 2) prior density for a specified coefficient vector. |
double |
log2Prior(Vector[] betas)
Returns the log (base 2) prior density for the specified array of coefficient vectors. |
static RegressionPrior |
noninformative()
Returns the noninformative or uniform prior to use for maximum likelihood regression fitting. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Method Detail |
|---|
public abstract double gradient(double betaForDimension,
int dimension)
betaForDimension - Parameter value for the specified dimension.dimension - The dimension.
public abstract double log2Prior(double betaForDimension,
int dimension)
betaForDimension - Parameter value for the specified dimension.dimension - The dimension.
public double log2Prior(Vector beta)
beta - Parameter vector.
IllegalArgumentException - If the specified parameter
vector does not match the dimensionality of the prior (if
specified).public double log2Prior(Vector[] betas)
betas - The parameter vectors.
IllegalArgumentException - If any of the specified
parameter vectors does not match the dimensionality of the
prior (if specified).public static RegressionPrior noninformative()
public static RegressionPrior gaussian(double priorVariance,
boolean noninformativeIntercept)
If the noninformative-intercept flag is set to
true, the prior variance for dimension zero
(0) is set to Double.POSITIVE_INFINITY.
See the class documentation above for more inforamtion on Gaussian priors.
priorVariance - Variance of the Gaussian prior for each
dimension.noninformativeIntercept - Flag indicating if intercept is
given a noninformative (uniform) prior.
IllegalArgumentException - If the prior variance is not
a non-negative number.public static RegressionPrior gaussian(double[] priorVariances)
See the class documentation above for more inforamtion on Gaussian priors.
priorVariances - Array of prior variances for dimensions.
IllegalArgumentException - If any of the variances are not
non-negative numbers.
public static RegressionPrior laplace(double priorVariance,
boolean noninformativeIntercept)
If the noninformative-intercept flag is set to
true, the prior variance for dimension zero
(0) is set to Double.POSITIVE_INFINITY.
See the class documentation above for more inforamtion on Laplace priors.
priorVariance - Variance of the Laplace prior for each
dimension.noninformativeIntercept - Flag indicating if intercept is
given a noninformative (uniform) prior.
IllegalArgumentException - If the variance is not a non-negative
number.public static RegressionPrior laplace(double[] priorVariances)
See the class documentation above for more inforamtion on Laplace priors.
priorVariances - Array of prior variances for dimensions.
IllegalArgumentException - If any of the variances is not
a non-negative number.
public static RegressionPrior cauchy(double priorSquaredScale,
boolean noninformativeIntercept)
See the class documentation above for more information on Cauchy priors.
priorSquaredScale - The square of the prior scae parameter.noninformativeIntercept - Flag indicating if intercept is
given a noninformative (uniform) prior.
IllegalArgumentException - If the scale is not a non-negative
number.public static RegressionPrior cauchy(double[] priorSquaredScales)
See the class documentation above for more information on Cauchy priors.
priorSquaredScales - Prior squared scale parameters.
IllegalArgumentException - If any of the prior squared
scales is not a non-negative number.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||