|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.spell.TokenizedDistance
public abstract class TokenizedDistance
The TokenizedDistance class provides an underlying
implementation of string distance based on comparing sets of
tokens. It holds a tokenizer factory and provides convenience
methods for extracting tokens from the input.
The method tokenSet(CharSequence) provides the set of
tokens derived by tokenizing the specified character sequence. The
method termFrequencyVector(CharSequence) provides a
mapping from tokens extracted by a tokenizer to integer counts.
| Constructor Summary | |
|---|---|
TokenizedDistance(TokenizerFactory tokenizerFactory)
Construct a tokenized distance from the specified tokenizer factory. |
|
| Method Summary | |
|---|---|
ObjectToCounterMap<String> |
termFrequencyVector(CharSequence cSeq)
Return the mapping from terms to their counts derived from the specified character sequence using the tokenizer factory in th is class. |
TokenizerFactory |
tokenizerFactory()
Return the tokenizer factory for this tokenized distance. |
Set<String> |
tokenSet(char[] cs,
int start,
int length)
Return the set of tokens produced by the specified character slice using the tokenizer for this distance measure. |
Set<String> |
tokenSet(CharSequence cSeq)
Return the set of tokens produced by the specified character sequence using the tokenizer for this distance measure. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Methods inherited from interface com.aliasi.util.Distance |
|---|
distance |
| Methods inherited from interface com.aliasi.util.Proximity |
|---|
proximity |
| Constructor Detail |
|---|
public TokenizedDistance(TokenizerFactory tokenizerFactory)
tokenizerFactory - Tokenizer for this distance.| Method Detail |
|---|
public TokenizerFactory tokenizerFactory()
public Set<String> tokenSet(CharSequence cSeq)
cSeq - Character sequence to tokenize.
public Set<String> tokenSet(char[] cs,
int start,
int length)
cs - Underlying array of characters.start - Index of first character in slice.length - Length of slice.
IndexOutOfBoundsException - If the start index is
not within the underlying array, or if the start index
plus the length minus one is not within the underlying
array.public ObjectToCounterMap<String> termFrequencyVector(CharSequence cSeq)
cSeq - Character sequence to tokenize.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||