com.aliasi.crf
Class ChainCrf<E>

java.lang.Object
  extended by com.aliasi.crf.ChainCrf<E>
Type Parameters:
E - Type of tokens in the tagging.
All Implemented Interfaces:
MarginalTagger<E>, NBestTagger<E>, Tagger<E>, Serializable

public class ChainCrf<E>
extends Object
implements Tagger<E>, NBestTagger<E>, MarginalTagger<E>, Serializable

The ChainCrf<E> class implements linear chain conditional random field decoding and estimation for input token sequences of type E.

The cliques of the graphical model are typically called nodes and edges in a chain CRF. A node consists of a token position, the rest of the input tokens, whereas an edge adds the previous tag. A CRF feature extractor converts nodes and edges to feature maps, which are then converted to vectors for use in CRFs.

Components of Linear Chain Conditional Random Fields

A linear-chain conditional random field (CRF) assigns tags to each element of an input sequence of items, which we call tokens. The inputs may be strings, or any kind of structured objects.

First, we suppose a fixed, finite set of K tags tags[0],...,tags[K-1] which may be assigned to tokens as part of a tagging. The coeffcient matrix β consists of K coefficient vectors, β[0],...,β[K-1], one for each tag.

Second, we assume a pair of feature extractors, one for nodes in the graphical model and one for edges. The node feature extractor φ0 maps sequences xs of input tokens and a position n in that sequence (0 <= n < xs.length) into a feature map φ0(xs,n).

The edge feature extractor φ1 maps input token sequences xs, a position n in that sequence, and previous tag tags[k] to a feature map φ1(xs,n,tags[k]).

The combined features extracted for input token sequence xs, position n in that sequence, and tag tags[k] are: defined by:

φ(xs,0,tags[k]) = φ0(xs,0)
φ(xs,n,tags[k]) = φ0(xs,n) + φ1(xs,n,tags[k]) for n > 0
The special case for position 0 is because there is no previous tag. Note that boundary features (begin-of-sequence, end-of-sequence) may be defined in φ0 because it knows the position and length of input.

Tag Sequence Probability

The probability model assumes a fixed sequence of input tokens xs[0],...,xs[N-1] of length N. The model assigns a conditional probability to sequences of tag indices ys[0],...,ys[N-1] (where 0 <= ys[n] < tags.length), by:

 p(ys|ws) ∝ expn < N φ(xs,n,tags[ys[n-1]])⋅β[ys[n]])
where the sum is over the dot product of the feature vector φ(xs,n,tags[ys[n-1]]) extracted for the input sequence of tokens xs at position n and previous tag tags[ys[n-1]] with index ys[n-1], and the coefficient for output tag tags[ys[n]], namely β[tags[ys[n]].

Note on Factored Coefficients

Note that this presentation of CRFs is a bit non-standard. In particular, LingPipe has factored what is usually a single large coefficient vector into one vector per output tag. This somewhat restricts the kinds of features that may be extracted so that each feature depends on a specific output tag. The reason this is done is for efficiency. Note that the node feature extractor φ0(xs,n) could be rolled into the more general edge feature extractor φ1(xs,n,t); the node feature extractor is only for efficiency, allowing features shared for all previous tags t to be extracted only once.

Legal Tags: Structural Zeros

The more verbose constructor for chain CRFs takes three array arguments which indicate which tags are legal in which positions. The shorter convenience constructor implicitly assumes any tag may occur anywhere. The legal start tag array indicates if a tag may start a tagging. The legal end tag array indicates if a tag may be the last tag in a tagging. The legal transition array indicates for a pair of tags if one may follow the other.

These are typically used when chunkers are encoded as taggers, and some sequences of tags are structurally illegal. These are called structural zeros in statistical modeling terms.

During estimation, these may be computed by specifying

First-Best Output: Viterbi

First-best output is calculated very efficiently in linear time in the size of the inputs using the Viterbi algorithm, a form of dynamic programming applicable to sequence tagging. Note that the normalizing constant does not need to be computed. This functionality is avaiable through the method tag(List) of the Tagger interface.

N-Best Sequence Output: Viterbi Forward, A* Backward

N-best output may be calculated in one linear forward pass and an efficient backward pass. The forward pass computes and stores the full Viterbi lattice consisting of the best analysis and score up to a given token for a given tag. The backward pass uses the A* algorithm with the Viterbi scores as exact completion estimates. This allows n-best analyses to be returned in order, with at most a constant amount of work per result.

N-best functionality is available through the method tagNBest(List,int) of the NBestTagger interface. This returns an iterator over scored tagging instances. The scores are the unnormalized products defined above.

N-best functionality with scores normalized to conditional probabilities may be computed using tagNBestConditional(List,int). This requires computing the normalizing constant using the forward-backward algorithm described in the next section.

Marginal Tag Probabiilty Output: Forward-Backward Algorithm

The forward-backward algorithm may be applied to conditional random fields. The generalized undirected graphical model form of this algorithm is known as the product-sum algorithm. The result is returned as an instance of TagLattice.

The forward-backward algorithm efficiently computes unnormalized forward, backward, and normalizing probabilities. As defined in the TagLattice class documentation, these may be used to define the probability of a tag or tag sequence staring at a specified position given the input tokens.

The unnormalized total probability Z may be used with n-best output to convert n-best sequence scores into conditional probabilities of the tagging given the input tokens.

Stochastic Gradient Descent Training

Training is carried out for conditional random fields the same way as for logistic regression classifiers (see LogisticRegressionClassifier and LogisticRegression. It's mathematically identical to think of CRFs as multi-way classifiers where the categories are sequences of tags. Under that analogy, training for conditional random fields exactly mirrors that for logistic regression. The only difference for conditional random fields is that the expectations at each step are computed using the forward-backward algorithm and efficiently applied over the lattice.

See the logistic regression classes for a description of the prior (also see RegressionPrior), and the learning and annealing rate (also see AnnealingSchedule).

Reporting is exactly the same for conditional random fields as for logistic regression, with each epoch reporting learning rate (lr), log likelihood (ll), log prior probability (lp), and the combined objective being optimized consisting of the sum of log likelihood and log prior probability (llp), and the best combined objective so far (llp*).

Unlike in logistic regression, the prior is applied according to a blocking strategy rather than through lazy updates. The block size is a parameter that determines how often the prior gradient is applied. In effect, it treats updates as blocks. The weight of the prior is adjusted accordingly. After all of the instances in an epoch, the prior is applied one more time to ensure that the coefficients are in the right state after each epoch.

Significant speedups can be achieved by setting the feature caching flag in the estimator method to true. This may require a large amount of memory, because it stores the entire lattice of vectors for each training instance.

Serialization

A chain CRF may be serialized if its parts are serializable: the node and edge feature extractor, the feature symbol table, and the vector coefficients.

The CRF read back in will be identical to the one that was serialized to the extent that the parts are equivalent: feature extractors, symbol table, and coefficients.

For CRFs estimated using the gradient descent estimator method, the symbol table and vector coefficients are serializable in such a way as to reproduce equivalent objects to the ones being serialized.

Thread Safety

A CRF is thread safe to the extent that its symbol table and feature extractors are thread safe. Calling the tagging methods concurrently will result in concurrent calls to feature extractors and symbol tables. The standard symbol table implementations are all thread safe, and almost all well-implemented feature extractors will be thread safe.

Since:
LingPipe3.9
Version:
3.9.2
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
ChainCrf(String[] tags, boolean[] legalTagStarts, boolean[] legalTagEnds, boolean[][] legalTagTransitions, Vector[] coefficients, SymbolTable featureSymbolTable, ChainCrfFeatureExtractor<E> featureExtractor, boolean addInterceptFeature)
          Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.
ChainCrf(String[] tags, Vector[] coefficients, SymbolTable featureSymbolTable, ChainCrfFeatureExtractor<E> featureExtractor, boolean addInterceptFeature)
          Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.
 
Method Summary
 boolean addInterceptFeature()
          Returns true if this CRF adds an intercept feature with value 1.0 at index 0 to all feature vectors.
 Vector[] coefficients()
          Return the coefficient vectors for this CRF.
static
<F> ChainCrf<F>
estimate(Corpus<ObjectHandler<Tagging<F>>> corpus, ChainCrfFeatureExtractor<F> featureExtractor, boolean addInterceptFeature, int minFeatureCount, boolean cacheFeatureVectors, boolean allowUnseenTransitions, RegressionPrior prior, int priorBlockSize, AnnealingSchedule annealingSchedule, double minImprovement, int minEpochs, int maxEpochs, Reporter reporter)
          Return the CRF estimated using stochastic gradient descent with the specified prior from the specified corpus of taggings of type F pruned to the specified minimum feature count, using the specified feature extractor, automatically adding an intercept feature if the flag is true, allow unseen tag transitions as specified, using the specified training parameters for annealing, measuring convergence, and reporting the incremental results to the specified reporter.
 ChainCrfFeatureExtractor<E> featureExtractor()
          Return the feature extractor for this CRF.
 SymbolTable featureSymbolTable()
          Returns an unmodifiable view of the symbol table for features for this CRF.
 String tag(int k)
          Returns the tag for the specified tag index.
 Tagging<E> tag(List<E> tokens)
          Return the tagging for the specified list of tokens.
 TagLattice<E> tagMarginal(List<E> tokens)
          Return the marginal tagging for the specified list of input tokens.
 Iterator<ScoredTagging<E>> tagNBest(List<E> tokens, int maxResults)
          Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n.
 Iterator<ScoredTagging<E>> tagNBestConditional(List<E> tokens, int maxResults)
          Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n, with scores normalized to conditional probabilities.
 List<String> tags()
          Returns an unmodifiable view of the array of tags underlying this CRF.
 String toString()
          Return a string-based representation of this chain CRF.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ChainCrf

public ChainCrf(String[] tags,
                Vector[] coefficients,
                SymbolTable featureSymbolTable,
                ChainCrfFeatureExtractor<E> featureExtractor,
                boolean addInterceptFeature)
Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.

Parameters:
tags - Array of output tags.
coefficients - Array of coefficient vectors parallel to tags.
featureSymbolTable - Symbol table for feature extraction to vectors.
featureExtractor - CRF feature extractor.
addInterceptFeature - true if an intercept feature at position 0 with value 1 is added to all feature vectors.
Throws:
IllegalArgumentException - If the tag and coefficient vector arrays are not non-empty and the same length, or if the coefficient vectors are not all of the same number of dimensions.

ChainCrf

public ChainCrf(String[] tags,
                boolean[] legalTagStarts,
                boolean[] legalTagEnds,
                boolean[][] legalTagTransitions,
                Vector[] coefficients,
                SymbolTable featureSymbolTable,
                ChainCrfFeatureExtractor<E> featureExtractor,
                boolean addInterceptFeature)
Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.

Parameters:
tags - Array of output tags
legalTagStarts - Array of flags indicating if tag may be first tag for a tagging.
legalTagEnds - Array of flags indicating if tag may be last tag for a tagging.
legalTagTransitions - Two dimensional array of flags indicating if the first tag may be followed by the second tag.
coefficients - Array of coefficient vectors parallel to tags.
featureSymbolTable - Symbol table for feature extraction to vectors.
featureExtractor - CRF feature extractor.
addInterceptFeature - true if an intercept feature at position 0 with value 1 is added to all feature vectors.
Throws:
IllegalArgumentException - If the tag and coefficient vector arrays are not non-empty and the same length, or if the coefficient vectors are not all of the same number of dimensions.
Method Detail

tags

public List<String> tags()
Returns an unmodifiable view of the array of tags underlying this CRF.

The array of coefficient vectors is parallel to the array of tags returned by tags()k, so the coefficient vector coefficients()[n] is for output tag tags()[n].

Returns:
View of the output tags.

tag

public String tag(int k)
Returns the tag for the specified tag index. This uses the underlying tags, so that tag(k) == tags()[k].

Parameters:
k - Position of tag.
Returns:
Tag for the specified position.
Throws:
ArrayIndexOutOfBoundsException - If the specified index is out of bounds for the tag array (k < 0 or k >= tags().length).

coefficients

public Vector[] coefficients()
Return the coefficient vectors for this CRF.

The array of coefficient vectors is parallel to the array of tags returned by tags()k, so the coefficient vector coefficients()[n] is for output tag tags()[n].

Returns:
The coefficient vectors.

featureSymbolTable

public SymbolTable featureSymbolTable()
Returns an unmodifiable view of the symbol table for features for this CRF.

Returns:
A view of the symbol table for features.

featureExtractor

public ChainCrfFeatureExtractor<E> featureExtractor()
Return the feature extractor for this CRF.

Returns:
The feature extractor.

addInterceptFeature

public boolean addInterceptFeature()
Returns true if this CRF adds an intercept feature with value 1.0 at index 0 to all feature vectors.

Returns:
Whether this CRF adds an intercept feature.

tag

public Tagging<E> tag(List<E> tokens)
Description copied from interface: Tagger
Return the tagging for the specified list of tokens.

Specified by:
tag in interface Tagger<E>
Parameters:
tokens - Input tokens to tag.
Returns:
Tagging for the specified input tokens.

tagNBest

public Iterator<ScoredTagging<E>> tagNBest(List<E> tokens,
                                           int maxResults)
Description copied from interface: NBestTagger
Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n.

Specified by:
tagNBest in interface NBestTagger<E>
Parameters:
tokens - Input tokens to tag.
maxResults - Maximum number of results to return.
Returns:
Iterator over the n-best scored taggings for the specified tokens.

tagNBestConditional

public Iterator<ScoredTagging<E>> tagNBestConditional(List<E> tokens,
                                                      int maxResults)
Description copied from interface: NBestTagger
Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n, with scores normalized to conditional probabilities.

Optional operation.

Specified by:
tagNBestConditional in interface NBestTagger<E>
Parameters:
tokens - Input tokens to tag.
maxResults - Maximum number of results to return.
Returns:
Iterator over the n-best scored taggings for the specified tokens.

tagMarginal

public TagLattice<E> tagMarginal(List<E> tokens)
Description copied from interface: MarginalTagger
Return the marginal tagging for the specified list of input tokens.

Specified by:
tagMarginal in interface MarginalTagger<E>
Parameters:
tokens - Input tokens to tag.
Returns:
The lattice of tags for the specified tokens.

toString

public String toString()
Return a string-based representation of this chain CRF. All information returned in this string representation is available programatically.

Warning: The output is very verbose, including symbolic representations of all the coefficients.

Overrides:
toString in class Object
Returns:
A string-based representation of this chain CRF.

estimate

public static <F> ChainCrf<F> estimate(Corpus<ObjectHandler<Tagging<F>>> corpus,
                                       ChainCrfFeatureExtractor<F> featureExtractor,
                                       boolean addInterceptFeature,
                                       int minFeatureCount,
                                       boolean cacheFeatureVectors,
                                       boolean allowUnseenTransitions,
                                       RegressionPrior prior,
                                       int priorBlockSize,
                                       AnnealingSchedule annealingSchedule,
                                       double minImprovement,
                                       int minEpochs,
                                       int maxEpochs,
                                       Reporter reporter)
                            throws IOException
Return the CRF estimated using stochastic gradient descent with the specified prior from the specified corpus of taggings of type F pruned to the specified minimum feature count, using the specified feature extractor, automatically adding an intercept feature if the flag is true, allow unseen tag transitions as specified, using the specified training parameters for annealing, measuring convergence, and reporting the incremental results to the specified reporter.

Reporting at the info level provides parameter and epoch level. At the debug level, it reports epoch-by-epoch likelihoods.

Parameters:
corpus - Corpus from which to estimate.
featureExtractor - Feature extractor for the CRF.
addInterceptFeature - Set to true if an intercept feature with index 0 is automatically added to all feature vectors with value 1.0.
minFeatureCount - Minimum number of instances of a feature to keep it.
cacheFeatureVectors - Flag indicating whether or not to keep the computed feature vectors in memory.
allowUnseenTransitions - Flag indicating whether to allow tags to start a tagging, end a tagging, or follow another tag if there was not an example of that in the corpus.
prior - Prior for coefficients to use during estimation.
annealingSchedule - Schedule for annealing the learning rate during gradient descent.
minImprovement - Minimum relative improvement objective (log likelihood plus log prior) computed as a 10-epoch rolling average to signal convergence.
minEpochs - Minimum number of epochs for which to run gradient descent estimation.
maxEpochs - Maximum number of epochs for which to run gradient descent estimation.
reporter - Reporter to which results are written, or null for no reporting of intermediate results.
Throws:
IOException