Skip navigation links
com.aliasi.crf

Class ChainCrf<E>

    • Constructor Summary

      Constructors 
      Constructor and Description
      ChainCrf(String[] tags, boolean[] legalTagStarts, boolean[] legalTagEnds, boolean[][] legalTagTransitions, Vector[] coefficients, SymbolTable featureSymbolTable, ChainCrfFeatureExtractor<E> featureExtractor, boolean addInterceptFeature)
      Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.
      ChainCrf(String[] tags, Vector[] coefficients, SymbolTable featureSymbolTable, ChainCrfFeatureExtractor<E> featureExtractor, boolean addInterceptFeature)
      Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.
    • Constructor Detail

      • ChainCrf

        public ChainCrf(String[] tags,
                        Vector[] coefficients,
                        SymbolTable featureSymbolTable,
                        ChainCrfFeatureExtractor<E> featureExtractor,
                        boolean addInterceptFeature)
        Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.
        Parameters:
        tags - Array of output tags.
        coefficients - Array of coefficient vectors parallel to tags.
        featureSymbolTable - Symbol table for feature extraction to vectors.
        featureExtractor - CRF feature extractor.
        addInterceptFeature - true if an intercept feature at position 0 with value 1 is added to all feature vectors.
        Throws:
        IllegalArgumentException - If the tag and coefficient vector arrays are not non-empty and the same length, or if the coefficient vectors are not all of the same number of dimensions.
      • ChainCrf

        public ChainCrf(String[] tags,
                        boolean[] legalTagStarts,
                        boolean[] legalTagEnds,
                        boolean[][] legalTagTransitions,
                        Vector[] coefficients,
                        SymbolTable featureSymbolTable,
                        ChainCrfFeatureExtractor<E> featureExtractor,
                        boolean addInterceptFeature)
        Construct a conditional random field from the specified tags, feature vector coefficients, symbol table for feature, feature extractors and flag indicating whether to add intercepts or not.
        Parameters:
        tags - Array of output tags
        legalTagStarts - Array of flags indicating if tag may be first tag for a tagging.
        legalTagEnds - Array of flags indicating if tag may be last tag for a tagging.
        legalTagTransitions - Two dimensional array of flags indicating if the first tag may be followed by the second tag.
        coefficients - Array of coefficient vectors parallel to tags.
        featureSymbolTable - Symbol table for feature extraction to vectors.
        featureExtractor - CRF feature extractor.
        addInterceptFeature - true if an intercept feature at position 0 with value 1 is added to all feature vectors.
        Throws:
        IllegalArgumentException - If the tag and coefficient vector arrays are not non-empty and the same length, or if the coefficient vectors are not all of the same number of dimensions.
    • Method Detail

      • tags

        public List<String> tags()
        Returns an unmodifiable view of the array of tags underlying this CRF.

        The array of coefficient vectors is parallel to the array of tags returned by tags()k, so the coefficient vector coefficients()[n] is for output tag tags()[n].

        Returns:
        View of the output tags.
      • tag

        public String tag(int k)
        Returns the tag for the specified tag index. This uses the underlying tags, so that tag(k) == tags()[k].
        Parameters:
        k - Position of tag.
        Returns:
        Tag for the specified position.
        Throws:
        ArrayIndexOutOfBoundsException - If the specified index is out of bounds for the tag array (k < 0 or k >= tags().length).
      • coefficients

        public Vector[] coefficients()
        Return the coefficient vectors for this CRF.

        The array of coefficient vectors is parallel to the array of tags returned by tags()k, so the coefficient vector coefficients()[n] is for output tag tags()[n].

        Returns:
        The coefficient vectors.
      • featureSymbolTable

        public SymbolTable featureSymbolTable()
        Returns an unmodifiable view of the symbol table for features for this CRF.
        Returns:
        A view of the symbol table for features.
      • featureExtractor

        public ChainCrfFeatureExtractor<E> featureExtractor()
        Return the feature extractor for this CRF.
        Returns:
        The feature extractor.
      • addInterceptFeature

        public boolean addInterceptFeature()
        Returns true if this CRF adds an intercept feature with value 1.0 at index 0 to all feature vectors.
        Returns:
        Whether this CRF adds an intercept feature.
      • tag

        public Tagging<E> tag(List<E> tokens)
        Description copied from interface: Tagger
        Return the tagging for the specified list of tokens.
        Specified by:
        tag in interface Tagger<E>
        Parameters:
        tokens - Input tokens to tag.
        Returns:
        Tagging for the specified input tokens.
      • tagNBest

        public Iterator<ScoredTagging<E>> tagNBest(List<E> tokens,
                                                   int maxResults)
        Description copied from interface: NBestTagger
        Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n.
        Specified by:
        tagNBest in interface NBestTagger<E>
        Parameters:
        tokens - Input tokens to tag.
        maxResults - Maximum number of results to return.
        Returns:
        Iterator over the n-best scored taggings for the specified tokens.
      • tagNBestConditional

        public Iterator<ScoredTagging<E>> tagNBestConditional(List<E> tokens,
                                                              int maxResults)
        Description copied from interface: NBestTagger
        Return an iterator over the n-best scored taggings for the specified input tokens up to a specified maximum n, with scores normalized to conditional probabilities.

        Optional operation.

        Specified by:
        tagNBestConditional in interface NBestTagger<E>
        Parameters:
        tokens - Input tokens to tag.
        maxResults - Maximum number of results to return.
        Returns:
        Iterator over the n-best scored taggings for the specified tokens.
      • tagMarginal

        public TagLattice<E> tagMarginal(List<E> tokens)
        Description copied from interface: MarginalTagger
        Return the marginal tagging for the specified list of input tokens.
        Specified by:
        tagMarginal in interface MarginalTagger<E>
        Parameters:
        tokens - Input tokens to tag.
        Returns:
        The lattice of tags for the specified tokens.
      • toString

        public String toString()
        Return a string-based representation of this chain CRF. All information returned in this string representation is available programatically.

        Warning: The output is very verbose, including symbolic representations of all the coefficients.

        Overrides:
        toString in class Object
        Returns:
        A string-based representation of this chain CRF.
      • estimate

        public static <F> ChainCrf<F> estimate(Corpus<ObjectHandler<Tagging<F>>> corpus,
                                               ChainCrfFeatureExtractor<F> featureExtractor,
                                               boolean addInterceptFeature,
                                               int minFeatureCount,
                                               boolean cacheFeatureVectors,
                                               boolean allowUnseenTransitions,
                                               RegressionPrior prior,
                                               int priorBlockSize,
                                               AnnealingSchedule annealingSchedule,
                                               double minImprovement,
                                               int minEpochs,
                                               int maxEpochs,
                                               Reporter reporter)
                                        throws IOException
        Return the CRF estimated using stochastic gradient descent with the specified prior from the specified corpus of taggings of type F pruned to the specified minimum feature count, using the specified feature extractor, automatically adding an intercept feature if the flag is true, allow unseen tag transitions as specified, using the specified training parameters for annealing, measuring convergence, and reporting the incremental results to the specified reporter.

        Reporting at the info level provides parameter and epoch level. At the debug level, it reports epoch-by-epoch likelihoods.

        Parameters:
        corpus - Corpus from which to estimate.
        featureExtractor - Feature extractor for the CRF.
        addInterceptFeature - Set to true if an intercept feature with index 0 is automatically added to all feature vectors with value 1.0.
        minFeatureCount - Minimum number of instances of a feature to keep it.
        cacheFeatureVectors - Flag indicating whether or not to keep the computed feature vectors in memory.
        allowUnseenTransitions - Flag indicating whether to allow tags to start a tagging, end a tagging, or follow another tag if there was not an example of that in the corpus.
        prior - Prior for coefficients to use during estimation.
        annealingSchedule - Schedule for annealing the learning rate during gradient descent.
        minImprovement - Minimum relative improvement objective (log likelihood plus log prior) computed as a 10-epoch rolling average to signal convergence.
        minEpochs - Minimum number of epochs for which to run gradient descent estimation.
        maxEpochs - Maximum number of epochs for which to run gradient descent estimation.
        reporter - Reporter to which results are written, or null for no reporting of intermediate results.
        Throws:
        IOException