com.aliasi.lm
Class CharSeqMultiCounter

java.lang.Object
  extended by com.aliasi.lm.CharSeqMultiCounter
All Implemented Interfaces:
CharSeqCounter

public class CharSeqMultiCounter
extends Object
implements CharSeqCounter

A CharSeqMultiCounter combines the counts from a pair of character sequence counters. The returned values are the values resulting from combining the counts in both counters.

Multi-counters are particularly useful in situations where a large or constant background counter must be updated several different ways simultaneously. For instance, a general 5-gram counter of a language trained over a lot of data might be combined with an 8-gram topic-specific model for use in a classifier.

More than two counters may be combined by combining them two at a time. The best strategy is to combine them two at a time into a balanced tree of counters, as done by the constructor CharSeqMultiCounter(CharSeqCounter[]). For instance, with CharSeqCounter instances c1, c2, c3, and c4, the balanced construction of c1234 in:

 CharSeqCounter c12 = new CharSeqMultiCounter(c1,c2);
 CharSeqCounter c34 = new CharSeqMultiCounter(c3,c4);
 CharSeqCounter c1234 = new CharSeqMultiCounter(c12,c34);
 
is more efficient for many operations than the linear construction in:
 CharSeqCounter c12 = new CharSeqMultiCounter(c1,c2);
 CharSeqCounter c123 = new CharSeqMultiCounter(c12,c3);
 CharSeqCounter c1234 = new CharSeqMultiCounter(c123,c4);
 

Implementation Note: The methods numCharactersFollowing(char[],int,int), charactersFollowing(char[],int,int), and observedCharacters() all call the contained counters' CharSeqCounter.charactersFollowing(char[],int,int) methods and then merge or count results. All other methods only perform arithmetic on the result of the corresponding method call son the contained counters.

Since:
LingPipe2.0
Version:
2.0
Author:
Bob Carpenter

Constructor Summary
CharSeqMultiCounter(CharSeqCounter[] counters)
          Construct a character sequence counter from the specified array of counters.
CharSeqMultiCounter(CharSeqCounter counter1, CharSeqCounter counter2)
          Construct a multi-counter from the specified pair of counters.
 
Method Summary
 char[] charactersFollowing(char[] cs, int start, int end)
          Returns the array of characters that have been observed following the specified character slice in unicode order.
 long count(char[] cs, int start, int end)
          Returns the count for the specified character sequence.
 long extensionCount(char[] cs, int start, int end)
          Returns the sum of the counts of all character sequences one character longer than the specified character slice.
 int numCharactersFollowing(char[] cs, int start, int end)
          Returns the number of characters that when appended to the end of the specified character slice produce an extended slice with a non-zero count.
 char[] observedCharacters()
          Returns an array consisting of the characters with non-zero count in unicode order.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CharSeqMultiCounter

public CharSeqMultiCounter(CharSeqCounter[] counters)
Construct a character sequence counter from the specified array of counters. This will piecewise construct multi-counters from the component counters in a balanced way.

Parameters:
counters - Array of counters to back multicounter.
Throws:
IllegalArgumentException - If the list of counters is less than two elements long.

CharSeqMultiCounter

public CharSeqMultiCounter(CharSeqCounter counter1,
                           CharSeqCounter counter2)
Construct a multi-counter from the specified pair of counters.

Parameters:
counter1 - First counter in multi-counter.
counter2 - Second counter in multi-counter.
Method Detail

count

public long count(char[] cs,
                  int start,
                  int end)
Description copied from interface: CharSeqCounter
Returns the count for the specified character sequence.

Specified by:
count in interface CharSeqCounter
Parameters:
cs - Underlying character array.
start - Index of first character in slice.
end - Index of one past last character in slice.
Returns:
Count of character array slice in model.

extensionCount

public long extensionCount(char[] cs,
                           int start,
                           int end)
Description copied from interface: CharSeqCounter
Returns the sum of the counts of all character sequences one character longer than the specified character slice.

Specified by:
extensionCount in interface CharSeqCounter
Parameters:
cs - Underlying character array.
start - Index of first character in slice.
end - Index of one past last character in slice.
Returns:
The sum of the counts of all character sequences one character longer than the specified character slice.

numCharactersFollowing

public int numCharactersFollowing(char[] cs,
                                  int start,
                                  int end)
Description copied from interface: CharSeqCounter
Returns the number of characters that when appended to the end of the specified character slice produce an extended slice with a non-zero count. In symbols:
numCharactersFollowing(cSlice)
  = | { c | count(cSlice.c) > 0 } |
where count(cSlice.c) represents the count of the character slice cSlice suffixed with the character c.

Specified by:
numCharactersFollowing in interface CharSeqCounter
Parameters:
cs - Underlying character array.
start - Index of first character in slice.
end - One plus index of last character in slice.
Returns:
The number of characters following the specified character slice.

charactersFollowing

public char[] charactersFollowing(char[] cs,
                                  int start,
                                  int end)
Description copied from interface: CharSeqCounter
Returns the array of characters that have been observed following the specified character slice in unicode order. The returned array will be in ascending unicode numerical order. Note that unicode order is not necessarily the same as any localized alpha-numeric sort order. rie

Specified by:
charactersFollowing in interface CharSeqCounter
Parameters:
cs - Underlying character array.
start - Index of first character in slice.
end - One plus index of last character in slice.
Returns:
The number of characters following the specified character slice.

observedCharacters

public char[] observedCharacters()
Description copied from interface: CharSeqCounter
Returns an array consisting of the characters with non-zero count in unicode order. The return value of this method will be equal to the return value of charactersFollowing(new char[0],0,0).

Specified by:
observedCharacters in interface CharSeqCounter
Returns:
Array of characters with non-zero counts.