com.aliasi.lm
Class CompiledTokenizedLM

java.lang.Object
  extended by com.aliasi.lm.CompiledTokenizedLM
All Implemented Interfaces:
LanguageModel, LanguageModel.Sequence, LanguageModel.Tokenized

public class CompiledTokenizedLM
extends Object
implements LanguageModel.Sequence, LanguageModel.Tokenized

A CompiledTokenizedLM implements a tokenized bounded sequence language model. Instances are read from streams of bytes created by compiling a TokenizedLM; see that class for more information.

Since:
LingPipe2.0
Version:
3.8
Author:
Bob Carpenter

Nested Class Summary
 
Nested classes/interfaces inherited from interface com.aliasi.lm.LanguageModel
LanguageModel.Conditional, LanguageModel.Dynamic, LanguageModel.Process, LanguageModel.Sequence, LanguageModel.Tokenized
 
Nested classes/interfaces inherited from interface com.aliasi.lm.LanguageModel
LanguageModel.Conditional, LanguageModel.Dynamic, LanguageModel.Process, LanguageModel.Sequence, LanguageModel.Tokenized
 
Method Summary
 double log2Estimate(char[] cs, int start, int end)
          Returns an estimate of the log (base 2) probability of the specified character slice.
 double log2Estimate(CharSequence cSeq)
          Returns an estimate of the log (base 2) probability of the specified character sequence.
 double tokenLog2Probability(String[] tokens, int start, int end)
          Returns the log (base 2) probability of the specified token slice in the underlying token n-gram distribution.
 double tokenProbability(String[] tokens, int start, int end)
          Returns the probability of the specified token slice in the token n-gram distribution.
 String toString()
          Returns a string-based representation of this compiled language model.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Method Detail

toString

public String toString()
Returns a string-based representation of this compiled language model.

Warning: The output may be very long for a large model and may blow out memory attempting to pile it into a string buffer.

Overrides:
toString in class Object
Returns:
A string-based representation of this language model.

log2Estimate

public double log2Estimate(CharSequence cSeq)
Description copied from interface: LanguageModel
Returns an estimate of the log (base 2) probability of the specified character sequence.

Specified by:
log2Estimate in interface LanguageModel
Parameters:
cSeq - Character sequence to estimate.
Returns:
Log estimate of likelihood of specified character sequence.

log2Estimate

public double log2Estimate(char[] cs,
                           int start,
                           int end)
Description copied from interface: LanguageModel
Returns an estimate of the log (base 2) probability of the specified character slice.

Specified by:
log2Estimate in interface LanguageModel
Parameters:
cs - Underlying array of characters.
start - Index of first character in slice.
end - One plus index of last character in slice.
Returns:
Log estimate of likelihood of specified character sequence.

tokenLog2Probability

public double tokenLog2Probability(String[] tokens,
                                   int start,
                                   int end)
Description copied from interface: LanguageModel.Tokenized
Returns the log (base 2) probability of the specified token slice in the underlying token n-gram distribution. This includes the estimation of the actual token for unknown tokens.

Specified by:
tokenLog2Probability in interface LanguageModel.Tokenized
Parameters:
tokens - Underlying array of tokens.
start - Index of first token in slice.
end - Index of one past the last token in the slice.
Returns:
The log (base 2) probability of the token slice.

tokenProbability

public double tokenProbability(String[] tokens,
                               int start,
                               int end)
Description copied from interface: LanguageModel.Tokenized
Returns the probability of the specified token slice in the token n-gram distribution. This estimate includes the estimates of the actual token for unknown tokens.

Specified by:
tokenProbability in interface LanguageModel.Tokenized
Parameters:
tokens - Underlying array of tokens.
start - Index of first token in slice.
end - Index of one past the last token in the slice.
Returns:
The probability of the token slice.