Package com.aliasi.lm

Classes for character- and token-based language models.

See:
          Description

Interface Summary
CharSeqCounter A CharSeqCounter counter provides counts for sequences of characters.
IntSeqCounter An IntSeqCounter provides counts for sequences of integers.
LanguageModel A LanguageModel provides an estimate of the probability of a sequence of characters.
LanguageModel.Conditional A LanguageModel.Conditional is a language model that implements conditional estimates of characters given previous characters.
LanguageModel.Dynamic A LanguageModel.Dynamic accepts training events in the form of character slices or sequences.
LanguageModel.Process A LanguageModel.Process is normalized by length.
LanguageModel.Sequence A LanguageModel.Sequence is normalized over all character sequences.
LanguageModel.Tokenized A LanguageModel.Tokenized provides a means of estimating the probability of a sequence of tokens.
TrieReader The TrieReader interface provides a means to read a trie structure with counts.
TrieWriter The TrieWriter interface provides a means to write an arbitrary trie structure with positive node counts.
 

Class Summary
BitTrieReader A BitTrieReader provides a trie reader that wraps a bit-level input.
BitTrieWriter A BitTrieWriter provides a trie writer that wraps a bit-level output.
CharSeqMultiCounter A CharSeqMultiCounter combines the counts from a pair of character sequence counters.
CompiledNGramBoundaryLM A CompiledNGramBoundaryLM is constructed by reading the serialized form of an instance of NGramBoundaryLM.
CompiledNGramProcessLM A CompiledNGramProcessLM implements a conditional process language model.
CompiledTokenizedLM A CompiledTokenizedLM implements a tokenized bounded sequence language model.
MultiTrieReader A MultiTrieReader merges two trie readers, providing output that is the result of adding the counts from the two readers.
NGramBoundaryLM An NGramBoundaryLM provides a dynamic sequence language model for which training, estimation and pruning may be interleaved.
NGramProcessLM An NGramProcessLM provides a dynamic conditional process language model process for which training, estimation, and pruning may be interleaved.
PruneTrieReader A PruneTrieReader filters a contained trie reader by removing all subtrees whose counts fall below a specified minimum.
ScaleTrieReader A ScaleTrieReader filters a contained trie reader by scaling all counts by a given multiple, removing all subtrees with zero root counts.
TokenizedLM A TokenizedLM provides a dynamic sequence language model which models token sequences with an n-gram model, and whitespace and unknown tokens with their own sequence language models.
TrieCharSeqCounter A TrieCharSeqCounter stores counts for substrings of strings.
TrieIntSeqCounter An TrieIntSeqCounter implements an integer sequence counter with a trie structure of counts.
UniformBoundaryLM A UniformBoundaryLM implements a uniform sequence language model with a specified number of outcomes and the same probability assigned to the end-of-stream marker.
UniformProcessLM A UniformLM.Sequence implements a uniform sequence language model with a specified number of outcomes and the same probability assigned to the end-of-stream marker.
 

Package com.aliasi.lm Description

Classes for character- and token-based language models.