|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.suffixarray.TokenSuffixArray
public class TokenSuffixArray
A TokenSuffixArray implements a suffix array of tokens.
See CharSuffixArray for a description of suffix arrays
and their applications.
If the maximum length is less than the length of the array, strings are truncated to be at most this length before comparison. The result isn't a standard, fully sorted suffix array, but can be faster to create and will suffice for many applications. The indexes will be sorted relative to the truncated strings, so they will be in order up to the specified length.
Thus if the tokenization corresponds to multiple documents, the boundary token should be used to separate them.
CharSuffixArray for details and an
example.
| Field Summary | |
|---|---|
static String |
DEFAULT_DOCUMENT_BOUNDARY_TOKEN
The default boundary token for documents. |
| Constructor Summary | |
|---|---|
TokenSuffixArray(Tokenization tokenization)
Construct at token suffix array with no limit on suffix length and the default document-boundary token. |
|
TokenSuffixArray(Tokenization tokenization,
int maxSuffixLength)
Construct a suffix array from the specified tokenization, comparing suffixes using up the specified maximum suffix length using the default document-boundary token. |
|
TokenSuffixArray(Tokenization tokenization,
int maxSuffixLength,
String documentBoundaryToken)
Construct a suffix array from the specified tokenization, comparing suffixes using up the specified maximum suffix length using the default document-boundary token. |
|
| Method Summary | |
|---|---|
String |
documentBoundaryToken()
Returns the token used to separate documents in this suffix array. |
int |
maxSuffixLength()
Returns the maximum suffix length for this token suffix array. |
List<int[]> |
prefixMatches(int minMatchLength)
Returns a list of maximal spans of suffix array indexes which refer to suffixes that share a prefix of at least the specified minimum match length. |
String |
substring(int idx,
int maxTokens)
Returns the substring of the original string that's spanned by the tokens starting at the specified suffix array index and running the specified maximum number of tokens (or until the token sequence ends). |
int |
suffixArray(int idx)
Returns the value of the suffix array at the specified index. |
int |
suffixArrayLength()
Returns the number of tokens in the suffix array. |
Tokenization |
tokenization()
Returns the tokenization underlying this suffix array. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String DEFAULT_DOCUMENT_BOUNDARY_TOKEN
| Constructor Detail |
|---|
public TokenSuffixArray(Tokenization tokenization)
tokenization - Tokenization on which to base the suffix
array.
public TokenSuffixArray(Tokenization tokenization,
int maxSuffixLength)
tokenization - Tokenization on which to base suffix array.maxSuffixLength - Maximum length of token sequences to compare.
public TokenSuffixArray(Tokenization tokenization,
int maxSuffixLength,
String documentBoundaryToken)
tokenization - Tokenization on which to base suffix array.maxSuffixLength - Maximum length of token sequences to compare.documentBoundaryToken - Token used to separate documents.| Method Detail |
|---|
public String documentBoundaryToken()
public int maxSuffixLength()
public Tokenization tokenization()
public int suffixArray(int idx)
idx - Suffix array index.
public int suffixArrayLength()
public String substring(int idx,
int maxTokens)
idx - Index in suffix array of first token.maxTokens - Maximum number of tokens to include
in string.
public List<int[]> prefixMatches(int minMatchLength)
minMatchLength - Minimum number of tokens required to
match.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||