|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tokenizer.NGramTokenizerFactory
public class NGramTokenizerFactory
An NGramTokenizerFactory creates n-gram tokenizers
of a specified minimum and maximun length.
An NGramTokenizer is a tokenizer that returns the
character n-grams from a specified sequence between a minimum
and maximum length. Whitespace takes the default behavior from
Tokenizer.nextWhitespace(), returning a string consisting of
a single space character.
For example, the result of
new NGramTokenizer("abcd".toCharArray(),0,4,2,3).tokenize()
is the string array:
{ "ab", "bc", "cd", "abc", "bcd" }
N-gram tokenizer factories are serializable.
| Constructor Summary | |
|---|---|
NGramTokenizerFactory(int minNGram,
int maxNGram)
Create an n-gram tokenizer factory with the specified minimum and maximum n-gram lengths. |
|
| Method Summary | |
|---|---|
int |
maxNGram()
Returns the maximum n-gram length returned by this tokenizer factory. |
int |
minNGram()
Returns the minimum n-gram length returned by this tokenizer factory. |
Tokenizer |
tokenizer(char[] cs,
int start,
int length)
Returns an n-gram tokenizer for the specified characters with the minimum and maximum n-gram lengths as specified in the constructor. |
String |
toString()
Returns a description of this n-gram tokenizer factory, including minimum and maximum token lengths. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public NGramTokenizerFactory(int minNGram,
int maxNGram)
minNGram - Minimum n-gram length.maxNGram - Maximum n-gram length.
IllegalArgumentException - If the minimum is greater than
the maximum or if the maximum is less than one.| Method Detail |
|---|
public int minNGram()
public int maxNGram()
public Tokenizer tokenizer(char[] cs,
int start,
int length)
tokenizer in interface TokenizerFactorycs - Underlying character array.start - Index of first character in array to tokenize.length - Number of characters to tokenize.public String toString()
toString in class Object
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||