

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object com.aliasi.lm.UniformBoundaryLM
public class UniformBoundaryLM
A UniformBoundaryLM
implements a uniform sequence
language model with a specified number of outcomes and the same
probability assigned to the endofstream marker. The formula
for computing sequence likelihood estimates is:
log2Estimate(cSeq) =
= log_{2} ( (cSeq.length()+1) / (numOutcomes+1) )
Adding one to the number of outcomes makes the endofsequence
just as likely as any other character. Adding one to the
sequence length adds the log likelihood of the endofsequence
marker itself.
This model is defined as dynamic for convenience. Calls to the training methods have no effect.
Nested Class Summary 

Nested classes/interfaces inherited from interface com.aliasi.lm.LanguageModel 

LanguageModel.Conditional, LanguageModel.Dynamic, LanguageModel.Process, LanguageModel.Sequence, LanguageModel.Tokenized 
Nested classes/interfaces inherited from interface com.aliasi.lm.LanguageModel 

LanguageModel.Conditional, LanguageModel.Dynamic, LanguageModel.Process, LanguageModel.Sequence, LanguageModel.Tokenized 
Field Summary  

static UniformBoundaryLM 
ZERO_LM
A constant uniform boundary language model returning zero log estimates. 
Constructor Summary  

UniformBoundaryLM()
Construct uniform boundary language model with the full set of characters. 

UniformBoundaryLM(double crossEntropyRate)
Create a constant uniform boundary LM with the specified character crossentropy rate. 

UniformBoundaryLM(int numOutcomes)
Construct a uniform boundary language model with the specified number of outcomes. 
Method Summary  

void 
compileTo(ObjectOutput objOut)
Writes a compiled version of this model to the specified object output. 
double 
log2Estimate(char[] cs,
int start,
int end)
Returns an estimate of the log (base 2) probability of the specified character slice. 
double 
log2Estimate(CharSequence cSeq)
Returns an estimate of the log (base 2) probability of the specified character sequence. 
int 
numOutcomes()
Returns the number of outcomes for this uniform model. 
void 
train(char[] cs,
int start,
int end)
Ignores the training data. 
void 
train(char[] cs,
int start,
int end,
int count)
Ignores the training data. 
void 
train(CharSequence cSeq)
Ignores the training data. 
void 
train(CharSequence cSeq,
int count)
Ignores the training data. 
Methods inherited from class java.lang.Object 

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Field Detail 

public static final UniformBoundaryLM ZERO_LM
This constant is particularly useful for removing the contribution of whitespace characters to token ngram language models.
Constructor Detail 

public UniformBoundaryLM()
public UniformBoundaryLM(int numOutcomes)
1/(numOutcomes+1)
.
numOutcomes
 Number of outcomes.public UniformBoundaryLM(double crossEntropyRate)
log_{2} P(cs)
=  crossEntropyRate * (cs.length() + 1)
The number of outcomes is set by rounding down the exponent of
the crossentropy and subtracting one for the boundary
character:
numOutcomes = (int) 2.0^{crossEntropyRate}  1
Even if the above expression evaluates to less than zero, the
number of outcomes will then be rounded up to zero.
crossEntropyRate
 The crossentropy rate of the model.
IllegalArgumentException
 If the crossentropy rate is
not finite and nonnegative.Method Detail 

public int numOutcomes()
public void compileTo(ObjectOutput objOut) throws IOException
UniformBoundaryLM
.
compileTo
in interface Compilable
objOut
 Object output to which this model is written.
IOException
 If there is an I/O error during the write.public void train(char[] cs, int start, int end)
train
in interface LanguageModel.Dynamic
cs
 Ignored.start
 Ignored.end
 Ignored.public void train(char[] cs, int start, int end, int count)
train
in interface LanguageModel.Dynamic
cs
 Ignored.start
 Ignored.end
 Ignored.count
 Ignored.public void train(CharSequence cSeq)
train
in interface LanguageModel.Dynamic
cSeq
 Ignored.public void train(CharSequence cSeq, int count)
train
in interface LanguageModel.Dynamic
cSeq
 Ignored.count
 Ignored.public double log2Estimate(char[] cs, int start, int end)
LanguageModel
log2Estimate
in interface LanguageModel
cs
 Underlying array of characters.start
 Index of first character in slice.end
 One plus index of last character in slice.
public double log2Estimate(CharSequence cSeq)
LanguageModel
log2Estimate
in interface LanguageModel
cSeq
 Character sequence to estimate.


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 