com.aliasi.lm
Class UniformProcessLM

java.lang.Object
  extended by com.aliasi.lm.UniformProcessLM
All Implemented Interfaces:
LanguageModel, LanguageModel.Dynamic, LanguageModel.Process, Compilable

public class UniformProcessLM
extends Object
implements LanguageModel.Dynamic, LanguageModel.Process

A UniformLM.Sequence implements a uniform sequence language model with a specified number of outcomes and the same probability assigned to the end-of-stream marker. The formula for computing sequence likelihood estimates is:

log2Estimate(cSeq) = = log2 ( (cSeq.length()+1) / (numOutcomes+1) )
Adding one to the number of outcomes makes the end-of-sequence just as likely as any other character. Adding one to the sequence length adds the log likelihood of the end-of-sequence marker itself.

Since:
LingPipe2.0
Version:
3.8.1
Author:
Bob Carpenter

Nested Class Summary
 
Nested classes/interfaces inherited from interface com.aliasi.lm.LanguageModel
LanguageModel.Conditional, LanguageModel.Dynamic, LanguageModel.Process, LanguageModel.Sequence, LanguageModel.Tokenized
 
Nested classes/interfaces inherited from interface com.aliasi.lm.LanguageModel
LanguageModel.Conditional, LanguageModel.Dynamic, LanguageModel.Process, LanguageModel.Sequence, LanguageModel.Tokenized
 
Constructor Summary
UniformProcessLM()
          Construct a uniform process language model with a number of outcomes equal to the total number of characters.
UniformProcessLM(double crossEntropyRate)
          Construct a uniform process language model with the specified character cross-entropy rate.
UniformProcessLM(int numOutcomes)
          Construct a uniform process language model with the specified number of outcomes.
 
Method Summary
 void compileTo(ObjectOutput objOut)
          Writes a compiled version of this model to the specified object output.
 double log2Estimate(char[] cs, int start, int end)
          Returns an estimate of the log (base 2) probability of the specified character slice.
 double log2Estimate(CharSequence cSeq)
          Returns an estimate of the log (base 2) probability of the specified character sequence.
 int numOutcomes()
          Returns the number of outcomes for this uniform model.
 void train(char[] cs, int start, int end)
          Ignores the training data.
 void train(char[] cs, int start, int end, int count)
          Ignores the training data.
 void train(CharSequence cSeq)
          Ignores the training data.
 void train(CharSequence cSeq, int count)
          Ignores the training data.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UniformProcessLM

public UniformProcessLM()
Construct a uniform process language model with a number of outcomes equal to the total number of characters.


UniformProcessLM

public UniformProcessLM(int numOutcomes)
Construct a uniform process language model with the specified number of outcomes. The per-character conditional estimate is 1/numOutcomes.

Parameters:
numOutcomes - The number of outcomes for this language model.

UniformProcessLM

public UniformProcessLM(double crossEntropyRate)
Construct a uniform process language model with the specified character cross-entropy rate. Recall that cross-entropy is the negative character average log probability:
log2 P(cs) = - crossEntropyRate * cs.length()
The number of outcomes is set by rounding down the exponent of the cross-entropy:
numOutcomes = (int) 2.0crossEntropyRate

Parameters:
crossEntropyRate - Character cross-entropy rate of the uniform model.
Method Detail

numOutcomes

public int numOutcomes()
Returns the number of outcomes for this uniform model.

Returns:
The number of outcomes for this uniform model.

compileTo

public void compileTo(ObjectOutput objOut)
               throws IOException
Writes a compiled version of this model to the specified object output. The object read back in will also be an instance of UniformProcessLM.

Specified by:
compileTo in interface Compilable
Parameters:
objOut - Object output to which this model is written.
Throws:
IOException - If there is an I/O error during the write.

train

public void train(char[] cs,
                  int start,
                  int end)
Ignores the training data.

Specified by:
train in interface LanguageModel.Dynamic
Parameters:
cs - Ignored.
start - Ignored.
end - Ignored.

train

public void train(char[] cs,
                  int start,
                  int end,
                  int count)
Ignores the training data.

Specified by:
train in interface LanguageModel.Dynamic
Parameters:
cs - Ignored.
start - Ignored.
end - Ignored.
count - Ignored.

train

public void train(CharSequence cSeq)
Ignores the training data.

Specified by:
train in interface LanguageModel.Dynamic
Parameters:
cSeq - Ignored.

train

public void train(CharSequence cSeq,
                  int count)
Ignores the training data.

Specified by:
train in interface LanguageModel.Dynamic
Parameters:
cSeq - Ignored.
count - Ignored.

log2Estimate

public double log2Estimate(char[] cs,
                           int start,
                           int end)
Description copied from interface: LanguageModel
Returns an estimate of the log (base 2) probability of the specified character slice.

Specified by:
log2Estimate in interface LanguageModel
Parameters:
cs - Underlying array of characters.
start - Index of first character in slice.
end - One plus index of last character in slice.
Returns:
Log estimate of likelihood of specified character sequence.

log2Estimate

public double log2Estimate(CharSequence cSeq)
Description copied from interface: LanguageModel
Returns an estimate of the log (base 2) probability of the specified character sequence.

Specified by:
log2Estimate in interface LanguageModel
Parameters:
cSeq - Character sequence to estimate.
Returns:
Log estimate of likelihood of specified character sequence.