Skip navigation links
com.aliasi.stats

Class LogisticRegression

    • Constructor Detail

      • LogisticRegression

        public LogisticRegression(Vector[] weightVectors)
        Construct a multinomial logistic regression model with the specified weight vectors. With k-1 weight vectors, the result is a multinomial classifier with k outcomes.

        The weight vectors are stored rather than copied, so changes to them will affect this class.

        See the class definition above for more information on logistic regression.

        Parameters:
        weightVectors - Weight vectors definining this regression model.
        Throws:
        IllegalArgumentException - If the array of weight vectors does not have at least one element or if there are two weight vectors with different numbers of dimensions.
      • LogisticRegression

        public LogisticRegression(Vector weightVector)
        Construct a binomial logistic regression model with the specified parameter vector. See the class definition above for more information on logistic regression.

        The weight vector is stored rather than copied, so changes to it will affect this class.

        Parameters:
        weightVector - The weights of features defining this model.
    • Method Detail

      • numInputDimensions

        public int numInputDimensions()
        Returns the dimensionality of inputs for this logistic regression model.
        Returns:
        The number of dimensions for this model.
      • numOutcomes

        public int numOutcomes()
        Returns the number of outcomes for this logistic regression model.
        Returns:
        The number of outcomes for this model.
      • weightVectors

        public Vector[] weightVectors()
        Returns an array of views of the weight vectors used for this regression model. The returned weight vectors are immutable views of the underlying vectors used by this model, so will change if the vectors making up this model change.
        Returns:
        An array of views of the weight vectors for this model.
      • classify

        public double[] classify(Vector x)
        Returns an array of conditional probabilities indexed by outcomes for the specified input vector. The resulting array has a value for index i that is equal to the probability of the outcome i for the specified input. The sum of the returned values will be 1.0 (modulo arithmetic precision).

        See the class definition above for more information on how the conditional probabilities are computed.

        Parameters:
        x - The input vector.
        Returns:
        The array of conditional probabilities of outcomes.
        Throws:
        IllegalArgumentException - If the specified vector is not the same dimensionality as this logistic regression instance.
      • classify

        public void classify(Vector x,
                             double[] ysHat)
        Fills the specified array with the conditional probabilities indexed by outcomes for the specified input vector.

        The resulting array has a value for index i that is equal to the probability of the outcome i for the specified input. The sum of the returned values will be 1.0 (modulo arithmetic precision).

        See the class definition above for more information on how the conditional probabilities are computed.

        Parameters:
        x - The input vector.
        ysHat - Array into which conditional probabilities are written.
        Throws:
        IllegalArgumentException - If the specified vector is not the same dimensionality as this logistic regression instance.
      • compileTo

        public void compileTo(ObjectOutput out)
                       throws IOException
        Compiles this model to the specified object output. The compiled model, when read back in, will remain an instance of this class, LogisticRegression.

        Compilation does the same thing as serialization.

        Specified by:
        compileTo in interface Compilable
        Parameters:
        out - Object output to which this model is compiled.
        Throws:
        IOException - If there is an underlying I/O error during serialization.
      • estimate

        public static LogisticRegression estimate(Vector[] xs,
                                                  int[] cs,
                                                  RegressionPrior prior,
                                                  AnnealingSchedule annealingSchedule,
                                                  Reporter reporter,
                                                  double minImprovement,
                                                  int minEpochs,
                                                  int maxEpochs)
        Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. The block size defaults to the number of examples divided by 50 (or 1 if the division results in 0).

        See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method.

        Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions.

        Parameters:
        xs - Input vectors indexed by training case.
        cs - Output categories indexed by training case.
        prior - The prior to be used for regression.
        annealingSchedule - Class to compute learning rate for each epoch.
        minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch.
        minEpochs - Minimum number of epochs.
        maxEpochs - Maximum number of epochs.
        reporter - Reporter to which progress reports are written, or null if no progress reports are needed.
        Throws:
        IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances.
      • estimate

        public static LogisticRegression estimate(Vector[] xs,
                                                  Vector[] cs,
                                                  RegressionPrior prior,
                                                  AnnealingSchedule annealingSchedule,
                                                  Reporter reporter,
                                                  double minImprovement,
                                                  int minEpochs,
                                                  int maxEpochs)
        Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter. The block size defaults to the number of examples divided by 50 (or 1 if the division results in 0).

        See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method.

        Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions.

        Parameters:
        xs - Input vectors indexed by training case.
        cs - Output vectors representing probabilistic category assignments indexed by training case.
        prior - The prior to be used for regression.
        annealingSchedule - Class to compute learning rate for each epoch.
        minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch.
        minEpochs - Minimum number of epochs.
        maxEpochs - Maximum number of epochs.
        reporter - Reporter to which progress reports are written, or null if no progress reports are needed.
        Throws:
        IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances.
      • estimate

        public static LogisticRegression estimate(Vector[] xs,
                                                  int[] cs,
                                                  RegressionPrior prior,
                                                  int blockSize,
                                                  LogisticRegression hotStart,
                                                  AnnealingSchedule annealingSchedule,
                                                  double minImprovement,
                                                  int rollingAverageSize,
                                                  int minEpochs,
                                                  int maxEpochs,
                                                  ObjectHandler<LogisticRegression> handler,
                                                  Reporter reporter)
        Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter.

        See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method.

        Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions.

        Parameters:
        xs - Input vectors indexed by training case.
        cs - Output categories indexed by training case.
        prior - The prior to be used for regression.
        blockSize - Number of examples whose gradient is computed before updating coefficients.
        hotStart - Logistic regression from which to retrieve initial weights or null to use zero vectors.
        annealingSchedule - Class to compute learning rate for each epoch.
        minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch.
        minEpochs - Minimum number of epochs.
        maxEpochs - Maximum number of epochs.
        handler - Handler for intermediate regression results.
        reporter - Reporter to which progress reports are written, or null if no progress reports are needed.
        Throws:
        IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances.
      • estimate

        public static LogisticRegression estimate(Vector[] xs,
                                                  Vector[] cs,
                                                  RegressionPrior prior,
                                                  int blockSize,
                                                  LogisticRegression hotStart,
                                                  AnnealingSchedule annealingSchedule,
                                                  double minImprovement,
                                                  int rollingAverageSize,
                                                  int minEpochs,
                                                  int maxEpochs,
                                                  ObjectHandler<LogisticRegression> handler,
                                                  Reporter reporter)
        Estimate a logistic regression model from the specified input data using the specified Gaussian prior, initial learning rate and annealing rate, the minimum improvement per epoch, the minimum and maximum number of estimation epochs, and a reporter.

        See the class documentation above for more information on logistic regression and the stochastic gradient descent algorithm used to implement this method.

        Reporting: Reports at the debug level provide epoch-by-epoch feedback. Reports at the info level indicate inputs and major milestones in the algorithm. Reports at the fatal levels are for thrown exceptions.

        Parameters:
        xs - Input vectors indexed by training case.
        cs - Output vectors representing probabilistic category assignments indexed by training case.
        prior - The prior to be used for regression.
        blockSize - Number of examples whose gradient is computed before updating coefficients.
        hotStart - Logistic regression from which to retrieve initial weights or null to use zero vectors.
        annealingSchedule - Class to compute learning rate for each epoch.
        minImprovement - The minimum relative improvement in log likelihood for the corpus to continue to another epoch.
        minEpochs - Minimum number of epochs.
        maxEpochs - Maximum number of epochs.
        handler - Handler for intermediate regression results.
        reporter - Reporter to which progress reports are written, or null if no progress reports are needed.
        Throws:
        IllegalArgumentException - If the set of input vectors does not contain at least one instance, if the number of output categories isn't the same as the input categories, if two input vectors have different dimensions, or if the prior has a different number of dimensions than the instances.
      • log2Likelihood

        public static double log2Likelihood(Vector[] inputs,
                                            int[] cats,
                                            LogisticRegression regression)
        Returns the log (base 2) likelihood of the specified inputs with the specified categories using the specified regression model.
        Parameters:
        inputs - Input vectors.
        cats - Categories for input vectors.
        regression - Model to use for computing likelihood.
        Throws:
        IllegalArgumentException - If the inputs and categories are not the same length.