

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object com.aliasi.stats.RegressionPrior
public abstract class RegressionPrior
A RegressionPrior
instance represents a prior
distribution on parameters for linear or logistic regression.
It has methods to return the log probabilities of input
parameters and compute the gradient of the log probability
for estimation.
Instances of this class are used as parameters in the LogisticRegression
class to control the regularization or lack
thereof used by the stochastic gradient descent optimizers. The
priors typically assume a zero mode (maximal value) for each
dimension, but allow variances (or scales) to vary by input
dimension. The method shiftMeans(double[],RegressionPrior)
may be used to shift the means (and hence modes) of priors.
The behavior of a prior under stochastic gradient fitting is
determined by its gradient, the partial derivatives with respect to
the dimensions of the error function for the prior (negative log
likelihood) with respect to a coefficient
β_{i}
.
gradient(β,i) =  ∂ log p(β) / ∂ β_{i}
See the class documentation for LogisticRegression
for more information.
Priors also implement a log (base 2) probability density for the prior for a given parameter in a given dimension. The total log prior probability is defined as the sum of the log probabilities for the dimensions,
log p(β) = Σ_{i} log p(β_{i})
Priors affect gradient descent fitting of regression through
their contribution to the gradient of the error function with
respect to the parameter vector. The contribution of the prior to
the error function is the negative log probability of the parameter
vector(s) with respect to the prior distribution. The gradient of
the error function is the collection of partial derivatives of the
error function with respect to the components of the parameter
vector. The regression prior abstract base class is defined in
terms of a single method gradient(double,int)
, which
specifies the value of the gradient of the error function for a
specified dimension with a specified value in that dimension.
This class implements static factory methods to construct noninformative, Gaussian and Laplace priors. The Gaussian and Laplace priors may specify a different variance for each dimension, but assumes all the prior means (which are equivalent to the modes) are zero. The priors also assume the dimensions are independent so that the full covariance matrix is assumed to be diagonal (that is, there is zero covariance between different dimensions).
Using a noninformative prior for regression results in standard maximum likelihood estimation.
The noninformative prior assumes an improper uniform distribution over parameter vectors:
and thus the log probabiilty is constantp(β_{i}) = Uniform(β_{i}) = constant
and therefore contributes nothing to the gradient:log p(β_{i}) = log constant
A noninformative prior is constructed using the static methodgradient(β,i) = 0.0
noninformative()
.
The Gaussian prior assumes a Gaussian (also known as normal) density over parameter vectors which results in L_{2}regularized regression, also known as ridge regression. Specifically, the prior allows a variance to be specified per dimension, but assumes dimensions are independent in that all offdiagonal covariances are zero. The Gaussian prior has a single mode that is the same as its mean.
The Gaussian density with variance is defined by:
which on a log scale isp(β_{i}) = 1.0/sqrt(2 * π σ_{i}^{2}) * exp(β_{i}^{2}/(2 * σ_{i}^{2}))
log p(β_{i}) = log (1.0/sqrt(2 * π * σ_{i}^{2})) + β_{i}^{2}/(2 * σ_{i}^{2})
The Gaussian prior leads to the following contribution to the
gradient for a dimension i
with parameter
β_{i}
and variance
σ_{i}^{2}
:
As usual, the lower the variance, the steeper the gradient, and the stronger the effect on the (maximum) a posteriori estimate.gradient(β,i) = β_{i}/σ_{i}^{2}
Gaussian priors are constructed using one of the static factory
methods, gaussian(double[])
or gaussian(double,boolean)
.
The Laplace prior assumes a Laplace density over parameter vectors which results in L_{1}regularized regression, also known as the lasso. The Laplace prior is called a doubleexponential distribution because it is looks like an exponential distribution for positive values and the reflection of this exponential distribution around zero (or more generally, around its mean parameter). The Laplace prior has the mode in the same location as the mean.
A Laplace prior allows a variance to be specified per dimension, but like the Gaussian prior, assumes means are zero and that the dimensions are independent in that all offdiagonal covariances are zero.
The Laplace density is defined by:
which on the log scale isp(β_{i}) = (sqrt(2)/(2 * σ_{i})) * exp( sqrt(2) * abs(β_{i}) / σ_{i})
log p(β_{i}) = log (sqrt(2)/(2 * σ_{i}))  sqrt(2) * abs(β_{i}) / σ_{i}
The Laplace prior leads to the following contribution to the
gradient for a dimension i
with parameter
beta_{i}
, mean zero and variance
σ_{i}^{2}
:
where the derivative of the absolute value function is thegradient(β,i) = sqrt(2) * signum(β_{i}) / σ_{i}
signum
function, as defined by Math.signum(double)
.
signum(x) = x > 0 ? 1 : (x < 0 ? 1 : 0)
Laplace priors are constructed using one of the static factory
methods, laplace(double[])
or laplace(double,boolean)
.
The Cauchy prior assumes a Cauchy density (also known as a Lorentz density) over priors. The Cauchy density allows a scale to be specified for each dimension. The mean and variance are undefined as their integrals diverge. The Cauchy distribution is symmetric and for regression priors, we assume a mode of zero for the base distribution. The Cauchy prior also has a single mode at its mean.
The Cauchy density with scale of 1 is a Studentt density with one degree of freedom.
The Cauchy density is defined by:
which on a log scale isp(β_{i},i) = (1 / π) * (λ_{i} / (β_{i}^{2} + λ_{i}^{2}))
log p(β_{i},i) = log (1 / π) + log (λ_{i})  log (β_{i}^{2} + λ_{i}^{2})
The Cauchy prior leads to the following contribution to the
gradient for dimension i
with parameter β_{i}
and scale
λ_{i}^{2}
:
gradient(β_{i}, i) = 2 β_{i} / (β_{i}^{2} + λ_{i}^{2})
Cauchy priors are constructed using one of the static factory
methods cauchy(double[])
or cauchy(double,boolean)
.
For use in gradientbased algorithms, the gradients of two
different priors may be interpolated. A special case is the
elastic net, discussed in he next section. Given two priors
p1
and p2
, and an interpolation ratio
α
between 0 and 1, the interpolated prior is
defined by
wherelog p(β_{i}) = α * log p1(β_{i}) + (1  α) * log p2(β_{i})  Z
Z
is the normalization constant not depending on
β
that normalizes the density,
p(β,i) = exp(log p(β_{i})) = exp(α * log p1(β_{i})) * exp((1  α) * log p2(β_{i})) / exp(Z) = p1(β,i)^{α} * p2(β,1)^{(1  α)} / exp(Z)
The gradient, being a derivative, will be the weighted sum of the
underlying gradients gradient1
and gradient2
,
gradient(β,i) = α * gradient1(β,i) + (1  α) * gradient2(β,i)
α
and scale
λ
is defined by
wherelog p(β,i) = α * log Laplace(β_{i}1/sqrt(λ)) + (1  α) Gaussian(β_{i}sqrt(2)/λ)
Laplace(β_{i}1/sqrt(λ))
is
the density of the (zeromean) Laplace distribution with variance
1/sqrt(λ)
, and
Gaussian(β_{i}sqrt(2)/λ)
is the
(zeromean) Gaussian density function with variance
sqrt(2)/λ
.
+ (1  α) Gaussian(β_{i}sqrt(2)/λ)
Thus the gradient is an interpolation of the gradients of the
Laplace with variance σ^{2} = 1/sqrt(λ)
and
Gaussian with variance σ^{2} = sqrt(2)/λ
,
leading to a simple gradient form,
gradient(β,i) = α * λ * signum(β_{i}) + (1  α) * λ * β_{i}
The basic elastic net prior has zero means and modes in all' dimensions, but may be shifted like other priors.
Priors with nonzero means or modes typically arise in hierarchical or multilevel regression models or models in which infomative priors are available on a dimensionbydimension basis.
Through the method shiftMeans(double[],RegressionPrior)
it is possible to shift the means of a prior by the specified
amount. This allows any prior to be used with nonzero means.
Probabilities are computed by shifting back. Suppose
p2
is the density and gradient2
the
gradient of the specified prior and shifts
the
specified array of floats specifying the mean shifts.
Probabilities and gradients are computed by shifting back,
andp(β) = p2(β  shifts)
Dimension by dimension, the value is computed by subtracting the shift from the value and plugging it into the underlying prior.gradient(β,i) = gradient2(β  shifts,i)
For example, to specify a Gaussian prior with means
mus
and variances vars
, use
double[] mus = ... double[] vars = ... RegressionPrior prior = shiftMeans(mus,gaussian(vars))
By convention, input dimension zero (0
) may be
reserved for the intercept and set to value 1.0 in all input
vectors. For regularized regression, the regularization is
typically not applied to the intercept term. To match this
convention, the factory methods allow a boolean parameter
indicating whether the intercept parameter has a
noninformative/uniform prior. If the intercept flag indicates it
is noninformative, then dimension 0 will not have an infinite
prior variance or scale, and hence a zero gradient. The result is
that the intercept will be fit by maximum likelihood.
All of the regression priors may be serialized.
For full details on the Gaussian, cauchy, and Laplace distributions, see:
For explanations of how the priors are used with regression including logistic regression, see the following three textbooks:
For details of the elastic net prior, see
Method Summary  

static RegressionPrior 
cauchy(double[] priorSquaredScales)
Returns the Cauchy prior for the specified squared scales. 
static RegressionPrior 
cauchy(double priorSquaredScale,
boolean noninformativeIntercept)
Returns the Cauchy prior with the specified prior squared scales for the dimensions. 
static RegressionPrior 
elasticNet(double laplaceWeight,
double scale,
boolean noninformativeIntercept)
Returns the elastic net prior with the specified weight on the Laplace prior, the specified scale parameter for the elastic net and a noninformative prior on the intercept (dimension 0) if the specified flag is set. 
static RegressionPrior 
gaussian(double[] priorVariances)
Returns the Gaussian prior with the specified priors for each dimension. 
static RegressionPrior 
gaussian(double priorVariance,
boolean noninformativeIntercept)
Returns the Gaussian prior with the specified prior variance and indication of whether the intercept is given a noninformative prior. 
abstract double 
gradient(double betaForDimension,
int dimension)
Returns the contribution to the gradient of the error function of the specified parameter value for the specified dimension. 
boolean 
isUniform()
Returns true if this prior is the uniform distribution. 
static RegressionPrior 
laplace(double[] priorVariances)
Returns the Laplace prior with the specified prior variances for the dimensions. 
static RegressionPrior 
laplace(double priorVariance,
boolean noninformativeIntercept)
Returns the Laplace prior with the specified prior variance and number of dimensions and indication of whether the intecept dimension is given a noninformative prior. 
abstract double 
log2Prior(double betaForDimension,
int dimension)
Returns the log (base 2) of the prior density evaluated at the specified coefficient value for the specified dimension (up to an additive constant). 
double 
log2Prior(Vector beta)
Returns the log (base 2) prior density for a specified coefficient vector (up to an additive constant). 
double 
log2Prior(Vector[] betas)
Returns the log (base 2) prior density for the specified array of coefficient vectors (up to an additive constant). 
static RegressionPrior 
logInterpolated(double alpha,
RegressionPrior prior1,
RegressionPrior prior2)
Returns the prior that interpolates its log probability between the specified priors with the weight going to the first prior. 
double 
mode(int dimension)
Returns the mode of the prior. 
static RegressionPrior 
noninformative()
Returns the noninformative or uniform prior to use for maximum likelihood regression fitting. 
static RegressionPrior 
shiftMeans(double[] shifts,
RegressionPrior prior)
Returns the prior that shifts the means of the specified prior by the specified values. 
Methods inherited from class java.lang.Object 

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Method Detail 

public boolean isUniform()
true
if this prior is the uniform distribution.
Uniform priors reduce to maximum likelihood calculations.
true
if this prior is the uniform distribution.public double mode(int dimension)
dimension
 Dimension position in vector.
public abstract double gradient(double betaForDimension, int dimension)
betaForDimension
 Parameter value for the specified dimension.dimension
 The dimension.
public abstract double log2Prior(double betaForDimension, int dimension)
betaForDimension
 Parameter value for the specified dimension.dimension
 The dimension.
public double log2Prior(Vector beta)
beta
 Parameter vector.
IllegalArgumentException
 If the specified parameter
vector does not match the dimensionality of the prior (if
specified).public double log2Prior(Vector[] betas)
betas
 The parameter vectors.
IllegalArgumentException
 If any of the specified
parameter vectors does not match the dimensionality of the
prior (if specified).public static RegressionPrior noninformative()
public static RegressionPrior gaussian(double priorVariance, boolean noninformativeIntercept)
If the noninformativeintercept flag is set to
true
, the prior variance for dimension zero
(0
) is set to Double.POSITIVE_INFINITY
.
See the class documentation above for more inforamtion on Gaussian priors.
priorVariance
 Variance of the Gaussian prior for each
dimension.noninformativeIntercept
 Flag indicating if intercept is
given a noninformative (uniform) prior.
IllegalArgumentException
 If the prior variance is not
a nonnegative number.public static RegressionPrior gaussian(double[] priorVariances)
See the class documentation above for more inforamtion on Gaussian priors.
priorVariances
 Array of prior variances for dimensions.
IllegalArgumentException
 If any of the variances are not
nonnegative numbers.public static RegressionPrior laplace(double priorVariance, boolean noninformativeIntercept)
If the noninformativeintercept flag is set to
true
, the prior variance for dimension zero
(0
) is set to Double.POSITIVE_INFINITY
.
See the class documentation above for more inforamtion on Laplace priors.
priorVariance
 Variance of the Laplace prior for each
dimension.noninformativeIntercept
 Flag indicating if intercept is
given a noninformative (uniform) prior.
IllegalArgumentException
 If the variance is not a nonnegative
number.public static RegressionPrior laplace(double[] priorVariances)
See the class documentation above for more inforamtion on Laplace priors.
priorVariances
 Array of prior variances for dimensions.
IllegalArgumentException
 If any of the variances is not
a nonnegative number.public static RegressionPrior cauchy(double priorSquaredScale, boolean noninformativeIntercept)
See the class documentation above for more information on Cauchy priors.
priorSquaredScale
 The square of the prior scae parameter.noninformativeIntercept
 Flag indicating if intercept is
given a noninformative (uniform) prior.
IllegalArgumentException
 If the scale is not a nonnegative
number.public static RegressionPrior cauchy(double[] priorSquaredScales)
See the class documentation above for more information on Cauchy priors.
priorSquaredScales
 Prior squared scale parameters.
IllegalArgumentException
 If any of the prior squared
scales is not a nonnegative number.public static RegressionPrior logInterpolated(double alpha, RegressionPrior prior1, RegressionPrior prior2)
See the class documentaton above for more information on Cauchy priors.
alpha
 Weight of first prior.prior1
 First prior for interpolation.prior2
 Second prior for interpolation.
IllegalArgumentException
 If the interpolation ratio is
not a number between 0 and 1 inclusive.public static RegressionPrior elasticNet(double laplaceWeight, double scale, boolean noninformativeIntercept)
See the class documentation above for more information on elastic net priors.
This is a convenience method for
logInterpolated(laplaceWeight, laplace(1/sqrt(scale),noninformativeIntercept), gaussian(sqrt(2)/scale,noninformativeIntercept))
laplaceWeight
 Weight on the Laplace prior.scale
 Scale parameter for the elastic net.noninformativeIntercept
 A flag indicating whether or not
the intercept (dimension 0) should have a noninformative prior.
IllegalArgumentException
 If the interpolation parameter
is not between 0 and 1 inclusive, and if the scale is not
positive and finite.public static RegressionPrior shiftMeans(double[] shifts, RegressionPrior prior)
See the class documentation above for more information.
shifts
 Mean shifts indexed by dimension.prior
 Prior to apply to shifted values.


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 