|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.classify.LMClassifier<L,MultivariateEstimator>
com.aliasi.classify.DynamicLMClassifier<L>
L - the type of dynamic language model for this classifierpublic class DynamicLMClassifier<L extends LanguageModel.Dynamic>
A DynamicLMClassifier is a language model classifier
that accepts training events of categorized character sequences.
Training is based on a multivariate estimator for the category
distribution and dynamic language models for the per-category
character sequence estimators. These models also form the basis of
the superclass's implementation of classification.
Because this class implements training and classification, it may be used in tag-a-little, learn-a-little supervised learning without retraining epochs. This makes it ideal for active learning applications, for instance.
At any point after adding training events, the classfier may be
compiled to an object output. The classifier read back in will be
a non-dynamic instance of LMClassifier. It will be based
on the compiled version of the multivariate estimator and the
compiled version of the dynamic language models for the categories.
Instances of this class allow concurrent read operations but
require writes to run exclusively. Reads in this context are
either calculating estimates or compiling; writes are training.
Extensions to LingPipe's classes may impose tighter restrictions.
For instance, a subclass of MultivariateEstimator
might be used that does not allow concurrent estimates; in that
case, its restrictions are passed on to this classifier. The same
goes for the language models and in the case of token language
models, the tokenizer factories.
| Constructor Summary | |
|---|---|
DynamicLMClassifier(String[] categories,
L[] languageModels)
Construct a dynamic language model classifier over the specified categories with specified language models per category and an overall category estimator. |
|
| Method Summary | |
|---|---|
MultivariateEstimator |
categoryEstimator()
Deprecated. As of 3.0, use general method LMClassifier.categoryDistribution(). |
void |
compileTo(ObjectOutput objOut)
Writes a compiled version of this classifier to the specified object output. |
static DynamicLMClassifier<NGramBoundaryLM> |
createNGramBoundary(String[] categories,
int maxCharNGram)
Construct a dynamic classifier over the specified cateogries, using boundary character n-gram models of the specified order. |
static DynamicLMClassifier<NGramProcessLM> |
createNGramProcess(String[] categories,
int maxCharNGram)
Construct a dynamic classifier over the specified categories, using process character n-gram models of the specified order. |
static DynamicLMClassifier<TokenizedLM> |
createTokenized(String[] categories,
TokenizerFactory tokenizerFactory,
int maxTokenNGram)
Construct a dynamic language model classifier over the specified categories using token n-gram language models of the specified order and the specified tokenizer factory for tokenization. |
void |
handle(CharSequence charSequence,
Classification classification)
Deprecated. Use handle(Classified) instead. |
void |
handle(Classified<CharSequence> classified)
Provides a training instance for the specified character sequence using the best category from the specified classification. |
L |
lmForCategory(String category)
Deprecated. As of 3.0, use general LMClassifier.languageModel(String). |
void |
resetCategory(String category,
L lm,
int newCount)
Resets the specified category to the specified language model. |
void |
train(String category,
char[] cs,
int start,
int end)
Deprecated. Use handle(Classified) instead. |
void |
train(String category,
CharSequence sampleCSeq)
Deprecated. Use handle(Classified) instead. |
void |
train(String category,
CharSequence sampleCSeq,
int count)
Provide a training instance for the specified category consisting of the specified sample character sequence with the specified count. |
| Methods inherited from class com.aliasi.classify.LMClassifier |
|---|
categories, categoryDistribution, classify, classifyJoint, languageModel |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public DynamicLMClassifier(String[] categories,
L[] languageModels)
The multivariate estimator over categories is initialized
with one count for each category. Technically, initializing
counts involves a uniform Dirichlet prior with
α=1, which is often called Laplace
smoothing.
categories - Categories used for classification.languageModels - Dynamic language models for categories.
IllegalArgumentException - If there are not at least two
categories, or if the length of the category and language model
arrays is not the same.| Method Detail |
|---|
@Deprecated
public void train(String category,
char[] cs,
int start,
int end)
handle(Classified) instead.
No modeling of the begin or end of the sequence is carried out. If such a behavior is desired, it should be reflected in the training instances supplied to this method.
The component models for this classifier may be accessed and
trained independently using LMClassifier.categoryDistribution() and
LMClassifier.languageModel(String).
category - Category of this training sequence.cs - Characters used for training.start - Index of first character to use for training.end - Index of one past the last character to use for
training.
IllegalArgumentException - If the category is not known.
@Deprecated
public void train(String category,
CharSequence sampleCSeq)
handle(Classified) instead.
train(String,char[],int,int).
category - Category of this training sequence.sampleCSeq - Category sequence for training.
IllegalArgumentException - If the category is not known.
public void train(String category,
CharSequence sampleCSeq,
int count)
train(String,char[],int,int).
Counts of zero are ignored, whereas counts less than zero raise an exception.
category - Category of this training sequence.sampleCSeq - Category sequence for training.count - Number of training instances.
IllegalArgumentException - If the category is not known
or if the count is negative.
@Deprecated
public void handle(CharSequence charSequence,
Classification classification)
handle(Classified) instead.
CharSequence,
and the result passed along with the first-best category
to train(String,CharSequence).
handle in interface ClassificationHandler<CharSequence,Classification>charSequence - Character sequence for training.classification - Classification to use for training.
ClassCastException - If the specified object does not
implement CharSequence.public void handle(Classified<CharSequence> classified)
handle in interface ObjectHandler<Classified<CharSequence>>classified - Classified character sequence to treat as
training data.@Deprecated public MultivariateEstimator categoryEstimator()
LMClassifier.categoryDistribution().
@Deprecated public L lmForCategory(String category)
LMClassifier.languageModel(String).
IllegalArgumentException - If the category is not known.
public void compileTo(ObjectOutput objOut)
throws IOException
LMClassifier.
compileTo in interface CompilableobjOut - Object output to which this classifier is
written.
IOException - If there is an I/O exception writing to
the output stream.
public void resetCategory(String category,
L lm,
int newCount)
category - Category to reset.lm - New dynamic language model for category.newCount - New count for category.
IllegalArgumentException - If the category is not known.
public static DynamicLMClassifier<NGramProcessLM> createNGramProcess(String[] categories,
int maxCharNGram)
See the documentation for the constructor DynamicLMClassifier(String[], LanguageModel.Dynamic[]) for
information on the category multivariate estimate for priors.
categories - Categories used for classification.maxCharNGram - Maximum length of character sequence
counted in model.
IllegalArgumentException - If there are not at least two
categories.
public static DynamicLMClassifier<NGramBoundaryLM> createNGramBoundary(String[] categories,
int maxCharNGram)
See the documentation for the constructor DynamicLMClassifier(String[], LanguageModel.Dynamic[]) for
information on the category multivariate estimate for priors.
categories - Categories used for classification.maxCharNGram - Maximum length of character sequence
counted in model.
IllegalArgumentException - If there are not at least two
categories.
public static DynamicLMClassifier<TokenizedLM> createTokenized(String[] categories,
TokenizerFactory tokenizerFactory,
int maxTokenNGram)
The multivariate estimator over categories is initialized with one count for each category.
The unknown token and whitespace models are uniform sequence models.
categories - Categories used for classification.maxTokenNGram - Maximum length of token n-grams used.tokenizerFactory - Tokenizer factory for tokenization.
IllegalArgumentException - If there are not at least two
categories.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||