com.aliasi.hmm
Class HmmDecoder

java.lang.Object
  extended by com.aliasi.hmm.HmmDecoder
All Implemented Interfaces:
MarginalTagger<String>, NBestTagger<String>, Tagger<String>

public class HmmDecoder
extends Object
implements Tagger<String>, NBestTagger<String>, MarginalTagger<String>

An HmmDecoder provides implementations of first-best, n-best and marginal taggers for hidden Markov models (HMMs). A decoder is constructed from a hidden Markov model.

First-best Output

HMM decoders implement the interface Tagger, which specifies a first-best tagging method tag(List). This method provides the likely (first best) path of HMM states (tags) given a sequence of string emissions (outputs). First-best decoding is implemented using Viterbi's dynamic programming algorithm.

N-best Output

HMM decoders also implement the interface NBestTagger, which specifies the method tagNBest(List,int) and tagNBestConditional(List,int). These methods both return an iterator over scored taggings. N-best decoding is implemented using the Viterbi algorithm in a forward pass and the A* algorithm in the backward pass using the Viterbi estimates as exact completion estimates. The variant conditional method further normalizes outputs to posterior conditional probabilities, and is a bit more expensive to compute.

Confidence and Lattice Output

HMM decoders also implement the MarginalTagger interface, which specifies a method tagMarginal(List) for providing marginal probability estimates for categories for a token given the input string. Marginal decoding is implemented using the standard forward-backward algorithm. The lattice is an instance of TagWordLattice, which itself implements TagLattice; see that class's documentation for information on how to retrieve cumulative (total) probabilities for input token sequences and posterior conditional probabilities (confidence scores) per token.

Caching

The decoder is able to cache emission probabilities, the computation of which is often the bottleneck in decoding. Caches may be provided in the constructor for both ordinary (linear) probabilities and log probabilities of emissions. A cache is simply an instance of Map from strings to arrays of doubles.

The first-best and n-best decoders only uses log probabilities, whereas the n-best normalized and lattice decoders use linear probabilities. Only the probabilities computed are cached, so if a program only does first-best processing, only linear estimates are cached.

Any implementation of Map may be used as a cache, but particular attention must be paid to thread safety and scalability. A fully synchronized cache can be created with:

 Map<String,double[]> cache 
     = java.util.Collections.synchronizedMap(new HashMap<String,double[]>());
LingPipe's map implementation FastCache is designed specifically to be used as a cache in settings such as these.

It is often (e.g. on English newsire) easy to get high token coverage (e.g. 97%) with a rather modestly sized cache (e.g. 100K tokens). Other corpora and languages may vary and we encourage experimentation with efficiency versus memory for caching. Note that run times will speed up as more and more estimates are returned from the cache rather than being computed directly.

Synchrnonization and Thread Safety

This class does not perform any underlying sychronization. If the hidden Markov model is not thread safe, then it must be synchronized. Similarly for the caches. Note that FastCache, while not synchronized, is thread safe. Similarly, the compilation of an HMM trained with HmmCharLmEstimator is thread safe, in fact allowing safe concurrent access because it is immutable.

Beam and Pruning

For first-best and n-best decoding, a beam may be used to prune unlikely hypotheses. This beam is set during construction or through the method setLog2EmissionBeam(double) (setting and access must be concurrent read/exclusive write synchronized from the caller). The beam works token by token. As each token is considered, any tag whose emission log (base 2) likelihood is more than the beam less than the bes5t emission estimate is eliminated from further consideration.

Since:
LingPipe2.1
Version:
3.9
Author:
Bob Carpenter

Constructor Summary
HmmDecoder(HiddenMarkovModel hmm)
          Construct an HMM decoder from the specified HMM.
HmmDecoder(HiddenMarkovModel hmm, Map<String,double[]> emissionCache, Map<String,double[]> emissionLog2Cache)
          Construct an HMM decoder from the specified HMM using the specified caches for linear and log probabilities.
HmmDecoder(HiddenMarkovModel hmm, Map<String,double[]> emissionCache, Map<String,double[]> emissionLog2Cache, double log2Beam, double log2EmissionBeam)
          Construct an HMM decoder from the specified HMM using the specified caches for linear and log probabilities, with the specified beam width for emission estimates.
 
Method Summary
 Map<String,double[]> emissionCache()
          Returns the mapping used to cache emission probabilities, or null if not caching.
 Map<String,double[]> emissionLog2Cache()
          Returns the mapping used to cache log (base 2) emission probabilities, or null if not caching.
 String[] firstBest(String[] emissions)
          Deprecated. Use tag(List) instead.
 HiddenMarkovModel getHmm()
          Returns the hidden Markov model underlying this decoder.
 TagWordLattice lattice(String[] emissions)
          Deprecated. Use tagMarginal(List) instead.
 Iterator<ScoredObject<String[]>> nBest(String[] emissions)
          Deprecated. Use tagNBest(List,int) instead.
 Iterator<ScoredObject<String[]>> nBest(String[] emissions, int maxN)
          Deprecated. Use tagNBest(List,int) instead.
 Iterator<ScoredObject<String[]>> nBestConditional(String[] emissions)
          Deprecated. Use tagNBestConditional(List,int) instead.
 void setEmissionCache(Map<String,double[]> cache)
          Sets the emission cache to the specified value.
 void setEmissionLog2Cache(Map<String,double[]> cache)
          Sets the log emission cache to the specified value.
 void setLog2Beam(double log2Beam)
          Sets the value of the log2 beam to the specified value.
 void setLog2EmissionBeam(double log2EmissionBeam)
          Sets the log (base 2) emission beam width.
 Tagging<String> tag(List<String> tokens)
          Return the tagging for the specified list of tokens.
 TagLattice<String> tagMarginal(List<String> tokens)
          Return the marginal tagging for the specified list of input tokens.
 Iterator<ScoredTagging<String>> tagNBest(List<String> tokens, int maxResults)
          Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n.
 Iterator<ScoredTagging<String>> tagNBestConditional(List<String> tokens, int maxResults)
          Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n, with scores normalized to conditional probabilities.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HmmDecoder

public HmmDecoder(HiddenMarkovModel hmm)
Construct an HMM decoder from the specified HMM. No caching is applied to estimates, and the beams are set to positive infinity, turning off pruning. This constructor is appropriate for dynamic models with changing probability estimates.

Parameters:
hmm - Model to use as basis of decoding.

HmmDecoder

public HmmDecoder(HiddenMarkovModel hmm,
                  Map<String,double[]> emissionCache,
                  Map<String,double[]> emissionLog2Cache)
Construct an HMM decoder from the specified HMM using the specified caches for linear and log probabilities. The beams are set to positive infinity, turning off pruning. Either or both of the caches may be null, in which case the corresponding values will not be cached.

Parameters:
hmm - Model to use for decoding.
emissionCache - Map to use for emission caching.
emissionLog2Cache - Map to use for log emission caching.

HmmDecoder

public HmmDecoder(HiddenMarkovModel hmm,
                  Map<String,double[]> emissionCache,
                  Map<String,double[]> emissionLog2Cache,
                  double log2Beam,
                  double log2EmissionBeam)
Construct an HMM decoder from the specified HMM using the specified caches for linear and log probabilities, with the specified beam width for emission estimates. Either or both of the caches may be null, in which case the corresponding values will not be cached.

Parameters:
hmm - Model to use for decoding.
emissionCache - Map to use for emission caching.
emissionLog2Cache - Map to use for log emission caching.
log2Beam - The log (base 2) beam for pruning full hypotheses.
log2EmissionBeam - The log (base 2) beam for pruning emission hypotheses.
Throws:
IllegalArgumentException - If either beam is not a non-negative number.
Method Detail

getHmm

public HiddenMarkovModel getHmm()
Returns the hidden Markov model underlying this decoder. The returned value is the actual HMM used by this decoder, so changes to it will affect this decoder.

Returns:
The HMM for this decoder.

emissionCache

public Map<String,double[]> emissionCache()
Returns the mapping used to cache emission probabilities, or null if not caching. This is the actual mapping, so changes to it will affect this decoder.

Returns:
The emission probability cache.

emissionLog2Cache

public Map<String,double[]> emissionLog2Cache()
Returns the mapping used to cache log (base 2) emission probabilities, or null if not caching. This is the actual mapping, so changes to it will affect this decoder.

Returns:
The emission probability cache.

setEmissionCache

public void setEmissionCache(Map<String,double[]> cache)
Sets the emission cache to the specified value.

Warning: This method should not be executed concurrently with any calls to decoding, as it may produce an inconsistent result. The typical application will be to set a cache before using a decoder.

Parameters:
cache - Cache for linear emission estimates.

setLog2EmissionBeam

public void setLog2EmissionBeam(double log2EmissionBeam)
Sets the log (base 2) emission beam width. Any tag with a log (base 2) emission probability more than the beam width less than the best hypothesis is discarded. Setting the beam width to zero results in pruning to category with the best emission. Setting the beam width to positive infinity effectively turns off the beam.

Parameters:
log2EmissionBeam - Width of beam.
Throws:
IllegalArgumentException - If the beam width is negative.

setLog2Beam

public void setLog2Beam(double log2Beam)
Sets the value of the log2 beam to the specified value. This beam controls pruning based on full Viterbi values for a given lattice slice. See the class documentation above for more details.

Parameters:
log2Beam - The log (base 2) Viterbi beam.
Throws:
IllegalArgumentException - If the beam width is negative.

setEmissionLog2Cache

public void setEmissionLog2Cache(Map<String,double[]> cache)
Sets the log emission cache to the specified value.

Warning: This method should not be executed concurrently with any calls to decoding, as it may produce an inconsistent result. The typical application will be to set a cache before using a decoder.

Parameters:
cache - Cache for linear emission estimates.

lattice

@Deprecated
public TagWordLattice lattice(String[] emissions)
Deprecated. Use tagMarginal(List) instead.

Return a complete tag-word lattice for the specified array of string emissions. Lattices provide forward and backward values.

Implementation Note: This method is implemented by the standard forward-backward dynamic programming algorithm. The estimates Pα(n,state) are of the probability of a derivation starting at the beginning and arriving at the specified state after the specified number of tokens.

Pα(0,state) = Pstart(state) * Pemit(emissions[0],state)
Pα(n,state)
    = ΣsourceState Pα(n-1,sourceState)
                  * Ptransit(state|sourceState)
                  * Pemit(emissions[n]|state)
Note that the forward probabilities up to a token position include the emission probability for that token.

The backward values are defined dually as the probability of a derivation ending at a specified state being continued to the end of the analysis.

Pβ(|emissions|-1,state) = Pend(state)
Pβ(n,state)
    = ΣtargetState Pβ(n+1,targetState)
                  * Ptransit(targetState|state)
                  * Pemit(emissions[n+1]|targetState)
Note that token emission probabilities for a given state are not included in the backward score; they are computed with target states in a way that matches the forward algorithm's definition. This asymmetry is so that the forward-backward estimates Pγ(n,state) correspond to the probability of a derivation being in a given state for a given position given the input:
Pγ(n,state) = Pα(n,state) * Pβ(n,state)
These values include the token emission probabilities, but may be normalized in the usual Bayesian fashion by dividing by the marginal P(emissions):
P(n,state|emissions) = Pγ(n,state) / P(emissions)
where the marginal is just the sum of all forward-backward values for any token position i:
P(emissions) = Σstate Pγ(i,state)
Most of the computation is carried out by the constructor TagWordLattice.TagWordLattice(String[],SymbolTable,double[],double[],double[][][]) with the following start, end and transition arrays:
start(state) = Pstart(0,state) * Pemit(emissions[0],state)
end(state) = Pend(state)
transition(i,sourceState,targetState)
    = Ptransit(targetState|sourceState)
      * Pemit(emissions[i]|targetState)

Parameters:
emissions - Array of strings emitted.
Returns:
Full tag-word lattice for specified emissions.

firstBest

@Deprecated
public String[] firstBest(String[] emissions)
Deprecated. Use tag(List) instead.

Returns an array consisting of the states with the highest likelihood to emit the specifed array of strings.

Implementation Note: This method is implemented with the Viterbi algorithm. The Viterbi algorithm uses dynamic programming (memoization) to compute the maximum probability of arriving in a state state after consuming inputs emissions[0],...,emissions[n-1]:

Pbest(0,state)
    = Pstart(state)
      * Pemit(emissions[0],state)
Pbest(n,state)
    = MAXprevState Pbest(n-1,prevState)
                  * Ptransit(state|prevState)
                  * Pemit(emissions[n]|state)
Pbest(last,state) *= Pend(state)
Note that the initial condition uses the start probability rather than the transition times the previous best probability. The notation in the last line is meant to indicate that the last index has the probability of a state being the last state multiplied in. As usual, we use logarithms and additions rather than multiplication.

As usual, the algorithm employs an array of backpointers from a state at a given input to the last state along the best path. This is computed by simply recording the state maximizing the above equation:

backPtr(0,state) = null
backPtr(n,state)
    = ARGMAXprevState Pbest(n-1,prevState)
                    * Ptransit(state|prevState)
                    * Pemit(emissions[n]|state)
By tracing the array of backpointers from the best final state, the best path can be recovered.

Parameters:
emissions - Array of strings emitted.
Returns:
Array of states most likely to have emitted the specified strings.

nBest

@Deprecated
public Iterator<ScoredObject<String[]>> nBest(String[] emissions)
Deprecated. Use tagNBest(List,int) instead.

Returns a best-first iterator of ScoredObject instances consisting of arrays of tags and log (base 2) joint likelihoods of the tags and emissions with respect to the underlying HMM. Only analyses with non-zero probability estimates are returned.

Implementation Note: This method is implemented by doing a Viterbi search to provide exact A* bounds for a backwards n-best pass using the A* algorithm. Thus it will be slower than just computing the first best result using firstBest(String[]). The iterator stores the entire Viterbi lattice as well as a priority queue of partial states ordered by the A* condition.

Parameters:
emissions - String outputs whose tag sequences are returned.
Returns:
Iterator over scored tag sequences in decreasing order of likelihood.

nBest

@Deprecated
public Iterator<ScoredObject<String[]>> nBest(String[] emissions,
                                                         int maxN)
Deprecated. Use tagNBest(List,int) instead.

Returns a best-first iterator of ScoredObject instances consisting of arrays of tags and log (base 2) joint likelihoods of the tags and emissions with respect to the underlying HMM up to the specified maximum number of results.

Implementation Note: This method is implemented by doing a Viterbi search to provide exact A* bounds for a backwards n-best pass using the A* algorithm. Thus it will be slower than just computing the first best result using firstBest(String[]). The iterator stores the entire Viterbi lattice as well as a priority queue of partial states ordered by the A* condition.

Parameters:
emissions - String outputs whose tag sequences are returned.
Returns:
Iterator over scored tag sequences in decreasing order of likelihood.

nBestConditional

@Deprecated
public Iterator<ScoredObject<String[]>> nBestConditional(String[] emissions)
Deprecated. Use tagNBestConditional(List,int) instead.

Returns a best-first iterator of scored objects consisting of arrays of tags and log (base 2) conditional likelihoods of the tags given the specified emissions with respect to the underlying HMM. Only analyses with non-zero probability estimates are returned. For this method, the sum of all iterated estimates should be 1.0, plus or minus rounding errors.

Conditional estimates of tags given emissions are derived from dividing the joint estimates by the marginal likelihood of the emissions as computed by summing over all joint estimates.

Implementation Note: The total log likelihood is returned by applying TagWordLattice.log2Total() to the result of decoding the input with lattice(String[]). The joint estimates are iterated using the iterator returned by nBest(String[]) and then modified by subtracting the emission marginal log likelihood from the joint emission/tags log likelihood. This method adds the cost of the full lattice computation to the joint n-best method. The space for the full lattice is used transiently when this method is called and may be garbage-collected even before the first element is returned by the iterator.

Parameters:
emissions - String outputs whose tag sequences are returned.
Returns:
Iterator over scored tag sequences in decreasing order of likelihood.

tag

public Tagging<String> tag(List<String> tokens)
Description copied from interface: Tagger
Return the tagging for the specified list of tokens.

Specified by:
tag in interface Tagger<String>
Parameters:
tokens - Input tokens to tag.
Returns:
Tagging for the specified input tokens.

tagNBest

public Iterator<ScoredTagging<String>> tagNBest(List<String> tokens,
                                                int maxResults)
Description copied from interface: NBestTagger
Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n.

Specified by:
tagNBest in interface NBestTagger<String>
Parameters:
tokens - Input tokens to tag.
maxResults - Maximum number of results to return.
Returns:
Iterator over the n-best scored taggings for the specified tokens.

tagNBestConditional

public Iterator<ScoredTagging<String>> tagNBestConditional(List<String> tokens,
                                                           int maxResults)
Description copied from interface: NBestTagger
Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n, with scores normalized to conditional probabilities.

Optional operation.

Specified by:
tagNBestConditional in interface NBestTagger<String>
Parameters:
tokens - Input tokens to tag.
maxResults - Maximum number of results to return.
Returns:
Iterator over the n-best scored taggings for the specified tokens.

tagMarginal

public TagLattice<String> tagMarginal(List<String> tokens)
Description copied from interface: MarginalTagger
Return the marginal tagging for the specified list of input tokens.

Specified by:
tagMarginal in interface MarginalTagger<String>
Parameters:
tokens - Input tokens to tag.
Returns:
The lattice of tags for the specified tokens.