|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.lm.TrieIntSeqCounter
public class TrieIntSeqCounter
An TrieIntSeqCounter implements an integer sequence
counter with a trie structure of counts.
Implementation Note: This trie-based integer sequence
counter is not as tight in memory as the character tries, but is
much more efficient for nodes with many daughters. It unfolds
1-daughter and 2-daughter nodes, and beyond that uses
balanced binary trees (via java.util.TreeMap)
| Constructor Summary | |
|---|---|
TrieIntSeqCounter(int maxLength)
Construct an integer sequence counter for subsequences up to the specified maximum length. |
|
| Method Summary | |
|---|---|
int |
count(int[] is,
int start,
int end)
Returns the count of the specified sequence of integers. |
long |
extensionCount(int[] is,
int start,
int end)
Returns the sum of the count of all sequences that extend the specified sequence by one integer. |
void |
handleNGrams(int nGram,
int minCount,
ObjectHandler<int[]> handler)
Supplies each n-gram of the specified length and with greater than or equal to the specified minimum count to the specified handler. |
void |
incrementSequence(int[] is,
int start,
int end,
int count)
Increments the count for the specified slice by the specified amount. |
void |
incrementSubsequences(int[] is,
int start,
int end)
Increments the count for all subsequences of the specified integer sequence up to the specified maximum length. |
void |
incrementSubsequences(int[] is,
int start,
int end,
int count)
Increments the count for all subsequences of the specified integer sequence up to the specified maximum length with the specified count. |
int[] |
integersFollowing(int[] is,
int start,
int end)
Returns an array of the integers that follow the specified integer array slice. |
int |
maxLength()
Returns the maximum length of subsequence of integers being counted. |
ObjectToCounterMap<int[]> |
nGramCounts(int nGram,
int minCount)
Returns a histogram of counts for n-grams of integers of the specified length, with a count of at least the specified minimum count. |
int |
numExtensions(int[] is,
int start,
int end)
Returns the number of one integer extensions of the specified with non-zero counts. |
int[] |
observedIntegers()
Returns an array of all integers that have non-zero counts in the model. |
void |
prune(int minCount)
Removes all counts for sequences that are less than the minimum count. |
void |
rescale(double countMultiplier)
Rescales all counts by multiplying them by the specified factor. |
String |
toString()
Return a string-based representation of this integer sequence counter. |
int |
trieSize()
Returns the size of this graph, measured in number of nodes in the trie structure. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public TrieIntSeqCounter(int maxLength)
maxLength - Maximum length of subsequences counted.
IllegalArgumentException - If the maximum length is
less than zero.| Method Detail |
|---|
public void prune(int minCount)
minCount - Minimum count to maintain a node.public void rescale(double countMultiplier)
int after being multipled by the scaling factor:
count=(int)(count*countMultiplier)
Unlike pruning, scaling has a cumulative effect and is not
idempotent. For instance, a count of four scaled by half once
will be two, and scaled by half twice will be one. Because of
rounding, it's not even guaranteed that rescaling twice,
rescale(0.5); rescale(0.5);, returns the same
result as rescaling with the combined factor,
rescale(0.25);.
Also unlike pruning, scaling, because of the integer rounding, may change the ratios between surviving counts. For instance, under scaling by 0.5, both 3 and 2 rescale to 1.
countMultiplier - Amount by which counts are scaled.public int maxLength()
public void incrementSubsequences(int[] is,
int start,
int end)
incrementSubsequences({1,3,17,8,122},1,4) with a
maximum length of 2 increments the bigram sequence
counts {3,17}, {17,8} and the unigram
sequence counts {3}, {17} and
{8}.
is - Underlying array of integers.start - Index of first integer in the slice.end - Index of one past the last integer in the slice.
IndexOutOfBoundsException - If the start and end minus one
indices do not fall within the range of the integer array.
public void incrementSubsequences(int[] is,
int start,
int end,
int count)
incrementSubsequences(is,start,end,n) is
equivalent to calling
incrementSubsequences(is,start,end) a total of
n times.
is - Underlying array of integers.start - Index of first integer in the slice.end - Index of one past the last integer in the slice.count -
IndexOutOfBoundsException - If the start and end minus one
indices do not fall within the range of the integer array.
IllegalArgumentException - If the count is less than zero.
public void incrementSequence(int[] is,
int start,
int end,
int count)
incrementSequence({1,2,3,4},1,3,15) results in the
sequence 2,3 having its count incremented by 15.
If the sequence provided is longer than the maximum sequence
counted, only its final counts are used. For example, if the
maximum length is 3, then calling
incrementSequence({1,2,3,4,5},0,5,12) is
equivalent to calling
incrementSequence({3,4,5},0,3,12).
is - Underlying array of integers.start - Index of first integer in the slice.end - Index of one past the last integer in the slice.count -
IndexOutOfBoundsException - If the start and end minus one
indices do not fall within the range of the integer array.
IllegalArgumentException - If the count is less than zero.
public ObjectToCounterMap<int[]> nGramCounts(int nGram,
int minCount)
nGram - Length of n-gram whose histrogram is returned.minCount - Minimum count of element in histogram.
IllegalArgumentException - If the n-gram length is less
than 1.public int trieSize()
public void handleNGrams(int nGram,
int minCount,
ObjectHandler<int[]> handler)
nGram - Length of n-grams to visit.minCount - Minimum count of visited n-gram.handler - Handler for visited n-grams.
public int count(int[] is,
int start,
int end)
IntSeqCounter
count in interface IntSeqCounteris - Underlying array of integers.start - Index of first integer in the slice.end - Index of one past the last integer in the slice.
public long extensionCount(int[] is,
int start,
int end)
IntSeqCounter
extensionCount in interface IntSeqCounteris - Underlying array of integers.start - Index of first integer in the slice.end - Index of one past the last integer in the slice.
public int numExtensions(int[] is,
int start,
int end)
IntSeqCounter
numExtensions in interface IntSeqCounteris - Underlying array of integers.start - Index of first integer in the slice.end - Index of one past the last integer in the slice.public int[] observedIntegers()
IntSeqCounter
observedIntegers in interface IntSeqCounter
public int[] integersFollowing(int[] is,
int start,
int end)
IntSeqCounter
integersFollowing in interface IntSeqCounteris - Underlying array of integers.start - Index of first integer in the slice.end - Index of one past the last integer in the slice.public String toString()
toString in class Object
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||