com.aliasi.tag
Class TagLattice<E>

java.lang.Object
  extended by com.aliasi.tag.TagLattice<E>
Type Parameters:
E - Type of tokens in the lattice.
Direct Known Subclasses:
ForwardBackwardTagLattice, TagWordLattice

public abstract class TagLattice<E>
extends Object

The abstract base class TagLattice provides an interface for the output of a marginal tagger, based on forward, backward, and transition log potentials. The allowance of general potentials makes this interface suitable for the output of either hidden Markov models (HMM) or linear chain conditional random fields (CRF).

Marginal Tag Probabilities

The forward, backward, and normalizing terms are used to define the marginal probability of a single tag at a specified position given the inputs:

 p(tag[n]=k | toks[0],...,toks[N-1])
    = fwd(n,k) * bk(n,k) / Z
To avoid problems with overflow, natural log values are used. On the log scale,
 log p(tag[n]=k | toks[0],...,toks[N-1])
    = log fwd(n,k) + log bk(n,k) - log Z
Warning: No checks are made to ensure that values supplied to the constructor form a coherent probability estimate.

Marginal Tag Sequence Probabilities

This allows us to compute the probability of a sequence of tags, for instance for a phrase, by:

 p(tag[n]=k0, tag[n+1]=k1, ..., tag[n+m]=km | toks)
    = fwd(n,k0) * bk(n+m,km) * Πi < m trans(n+i,tags[n+i],tags[n+i+1]) / Z
On the log scale, this becomes:
 log p(tag[n]=k0, tag[n+1]=k1, ..., tag[n+m]=km | toks)
    = log fwd(n,k0) + log bk(n+m,km) * Σi < m log trans(n+i,tags[n+i],tags[n+i+1]) - log Z
The log of this value returned by logProbability(int,int[]), where the first argument is n and the second argument {k0,k1,...,km}.

Linear Probabilities

Linear probabilities (or potentials) are computed by exponentiating the log probabilities returned by the various methods.

Transition Potentials

The transitions are defined by:

 trans(n,k1,k2) = φ(tag[n]=k2|tag[n-1]=k1)
where φ is the transition potential from the tag k1 at position n-1 to tag k2 at position n. The log of this value is returned by logTransition(int,int,int), where the first argument is n and the second two k1 and k2.

Forward Potentials

The forward values are defined by:

 log fwd(0,k) = init(k)
where init(k) is the start potential, defined on an application-specific basis. The recursive step defines the forward values for subsequent positions 0 < n < N& and tags k
 fwd(n,k) = Σk' fwd(n-1,k') * trans(n-1,k,k')
This is typically computed for all n and k in n*k time using the forward algorithm.

The log of this value is returned by logForward(int,int) where the first argument is n and the second k.

Backward Potentials

The backward potentials are similar, but condition on rather than predict the label. This simplifies the basis of the recursion to

 bk(N-1,k) = 1.0
where N is the length of the input (number of tokens). The recursive step for 0 <= n < N-1 is
 bk(n,k) = Σk' trans(n,k,k') * bk(n+1,k')
The log of this value is returned by logBackward(int,int) where the first argument is n and the second k.

Normalizer

The normalizer is the sum over all paths, and acts to normalize sequences to probabilities. It may be computed by:

 Z = Σk fwd(N-1,k)
The log of this value is returned by logZ().

Probabilistic Lattices

In some settings, such as hidden Markov models (HMMs) the forward, backward, transition and normalizer may be interpreted as probabilities.

 trans(n,k1,k2) = p(tag[n]=k2|tag[n-1]=k1)
 fwd(n,k) = p(label[n]=k, toks[0], ..., toks[n])
 bk(n,k) = p(toks[n+1],...,toks[N-1] | label[n]=k)
 Z = p(toks[0],...,toks[N-1])

Since:
LingPipe3.9
Version:
3.9
Author:
Bob Carpenter

Constructor Summary
TagLattice()
          Construct an empty tag lattice.
 
Method Summary
abstract  double logBackward(int token, int tag)
          Returns the log of the backward probability to the specified token and tag.
abstract  double logForward(int token, int tag)
          Return the log of the forward probability of the specified tag at the specified position.
abstract  double logProbability(int n, int tag)
          Convenience method returning the log of the conditional probability that the specified token has the specified tag, given the complete list of input tokens.
abstract  double logProbability(int nFrom, int[] tags)
          Return the log conditional probability that the tokens starting with the specified token position have the specified tags given the complete sequence of input tokens.
abstract  double logProbability(int nTo, int tagFrom, int tagTo)
          Convenience method returning the log of the conditional probability that the specified two tokens have the specified tag given the complete list of input tokens.
abstract  double logTransition(int tokenFrom, int tagFrom, int tagTo)
          Returns the log of the transition probability from the specified input token position with the specified previous tag to the specified target tag.
abstract  double logZ()
          Return the log of the normalizing constant for the lattice.
abstract  int numTags()
          Return the number of tags in this tag lattice.
abstract  int numTokens()
          Returns the length of this tag lattice as measured by number of tokens.
abstract  String tag(int id)
          Return the tag with the specified symbol identifier.
abstract  List<String> tagList()
          Returns an unmodifiable view of the list of tags used in this lattice, indexed by identifier.
abstract  SymbolTable tagSymbolTable()
          Returns a symbol table which converts tags to identifiers and vice-versa.
abstract  E token(int n)
          Return the token at the specified position in the input.
 ConditionalClassification tokenClassification(int tokenIndex)
          Returns the classification of the token at the specified position in this tag lattice.
abstract  List<E> tokenList()
          Return an unmodifiable view of the underlying tokens for this tag lattice.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TagLattice

public TagLattice()
Construct an empty tag lattice.

Method Detail

tokenList

public abstract List<E> tokenList()
Return an unmodifiable view of the underlying tokens for this tag lattice.

Returns:
The tokens for this lattice.

tagList

public abstract List<String> tagList()
Returns an unmodifiable view of the list of tags used in this lattice, indexed by identifier.

Returns:
The symbol table for tags.

tag

public abstract String tag(int id)
Return the tag with the specified symbol identifier.

Parameters:
id - Identifer for tag.
Returns:
Tag with specified ID.
Throws:
IndexOutOfBoundsException - If the specified identifier is not in range for the list of tags.

numTags

public abstract int numTags()
Return the number of tags in this tag lattice.

Returns:
Number of tags for this tag lattice.

token

public abstract E token(int n)
Return the token at the specified position in the input.

Parameters:
n - Input position.
Returns:
Token at position.
Throws:
IndexOutOfBoundsException - If the specified index is not in range for the list of tokens.

numTokens

public abstract int numTokens()
Returns the length of this tag lattice as measured by number of tokens.

Returns:
Number of tokens in this lattice.

tagSymbolTable

public abstract SymbolTable tagSymbolTable()
Returns a symbol table which converts tags to identifiers and vice-versa.

A new symbol table is constructed for each call, so it should be saved and reused if possible. Changing the returned symbol table will not affect this lattice.

Returns:
Symbol table for tags in this lattice.

logProbability

public abstract double logProbability(int n,
                                      int tag)
Convenience method returning the log of the conditional probability that the specified token has the specified tag, given the complete list of input tokens.

This method returns results defined by

 logProbability(n,tag)
     == logProbability(n,new int[] { tag })

Parameters:
n - Position of input token.
tag - Identifier of tag.
Returns:
The log probability the token has the tag.
Throws:
ArrayIndexOutOfBoundsException - If the token or tag identifiers are not in range.

logProbability

public abstract double logProbability(int nTo,
                                      int tagFrom,
                                      int tagTo)
Convenience method returning the log of the conditional probability that the specified two tokens have the specified tag given the complete list of input tokens.

This method returns results defined by

 logProbability(nTo,tagFrom,tagTo) 
     == logProbability(n-1,new int[] { tagFrom, tagTo })

Parameters:
nTo - Position of second token.
tagFrom - First Tag from which transition is made.
tagTo - Second Tag to which transition is made.
Returns:
Log probability of the tags at the specified position.

logProbability

public abstract double logProbability(int nFrom,
                                      int[] tags)
Return the log conditional probability that the tokens starting with the specified token position have the specified tags given the complete sequence of input tokens.

Parameters:
nFrom - Starting position of sequence.
tags - Tag identifiers for sequence.
Returns:
Log probability that sequence starting at the specified position has the specified tags.
Throws:
IllegalArgumentException - If the token is out of range or the token plus the length of the tag sequence is out of range of tokens, or if any of the tags is not a known identifier.

logForward

public abstract double logForward(int token,
                                  int tag)
Return the log of the forward probability of the specified tag at the specified position. The forward probability is the sum of the joint probabilities of all sequences from the initial token to the specified token ending with the specified tag.

Parameters:
token - Token position.
tag - Tag identifier.
Returns:
Log forward probability specified token has specified tag.
Throws:
ArrayIndexOutOfBoundsException - If the token or tag index are out of bounds for this lattice's tokens or tags.

logBackward

public abstract double logBackward(int token,
                                   int tag)
Returns the log of the backward probability to the specified token and tag. The backward probability is the sum of the joint probabilities of all sequences starting from the specified token and specified tag and going to the end of the list of tokens.

Parameters:
token - Input token position.
tag - Tag identifier.
Throws:
ArrayIndexOutOfBoundsException - If the token or tag index are out of bounds for this lattice's tokens or tags.

logTransition

public abstract double logTransition(int tokenFrom,
                                     int tagFrom,
                                     int tagTo)
Returns the log of the transition probability from the specified input token position with the specified previous tag to the specified target tag.

Parameters:
tokenFrom - Token position from which the transition is made.
tagFrom - Identifier for the previous tag from which the transition is made.
tagTo - Tag identifier for the target tag to which the the transition is made.
Returns:
Log probability of the transition.
Throws:
ArrayIndexOutOfBoundsException - If the token index or either of the tag indexes are out of bounds for this lattice's tokens or tags.

logZ

public abstract double logZ()
Return the log of the normalizing constant for the lattice. Its value is the log of the marginal probability of the input tokens. By the additive law of probability, this is equivalent to the sum of the probabilities of all possible analyses for the input sequence

Returns:
The normalizing constant.

tokenClassification

public ConditionalClassification tokenClassification(int tokenIndex)
Returns the classification of the token at the specified position in this tag lattice. Tag probabilities are conditional probabilities of a tag given the entire sequence of input tokens. The tags are represented in their string form.

Parameters:
tokenIndex - Position of token to classify.
Returns:
Classification of token in terms of tags.