|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tokenizer.Tokenization
public class Tokenization
A Tokenization represents the result of tokenizing a
string. Tokenizations are constructed from a character sequence
and a tokenizer factory. A tokenization contains the underlying
text, tokens, and token start/end positions in the text.
Hash codes are consistent with equality. They only depend on the text and number of tokens.
| Constructor Summary | |
|---|---|
Tokenization(char[] cs,
int start,
int length,
TokenizerFactory factory)
Construct a tokenization from the specified text and tokenizer factory. |
|
Tokenization(String text,
List<String> tokens,
List<String> whitespaces,
int[] tokenStarts,
int[] tokenEnds)
Construct a tokenization from the specified components. |
|
Tokenization(String text,
TokenizerFactory factory)
Construct a tokenization from the specified text and tokenizer factory. |
|
| Method Summary | |
|---|---|
boolean |
equals(Object that)
Returns true if the specified object is a tokenization
that is equal to this one. |
int |
hashCode()
Returns the hash code for this tokenization. |
int |
numTokens()
Return the number of tokens in this tokenization. |
String |
text()
Return the underlying text for this tokenization. |
String |
token(int n)
Return the token at the specified input position. |
int |
tokenEnd(int n)
Return the position of one past the last character in the specified input position. |
List<String> |
tokenList()
Returns an unmodifiable view of the list of tokens for this tokenization. |
String[] |
tokens()
Returns the array of tokens underlying this tokenization. |
int |
tokenStart(int n)
Return the position of the first character in the specified input position. |
String |
whitespace(int n)
Return the whitespace before the token at the specified input position, or the last whitespace if the specified position is the number of tokens. |
List<String> |
whitespaceList()
Returns an unmodifiable view of the list of whitespaces for this tokenization. |
String[] |
whitespaces()
Return the array of whitespaces for this tokenization. |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public Tokenization(char[] cs,
int start,
int length,
TokenizerFactory factory)
cs - Underlying character array.start - Index of first character in slice.length - Length of slice.factory - Tokenizer factory to use for tokenization.
IndexOutOfBoundsException - If the start and length
indices are outside of bounds of the array.
public Tokenization(String text,
TokenizerFactory factory)
text - Underlying text for tokenization.factory - Tokenizer factory to perform tokenization.
public Tokenization(String text,
List<String> tokens,
List<String> whitespaces,
int[] tokenStarts,
int[] tokenEnds)
text - Underlying text.tokens - List of tokens.whitespaces - List of whitespaces.tokenStarts - Offset of first character in tokens.tokenEnds - Offset of last character plus one in tokens.
IllegalArgumentException - If the number of whitespaces is not
equal to the number of tokens plus one, a tokens start occurs after
a token end, or a token start or end is out of bounds for the text.| Method Detail |
|---|
public String text()
public int numTokens()
public String token(int n)
n - Position of token.
IndexOutOfBoundsException - If the position is less than 0 or
greater than or equal to the number of tokens.public String whitespace(int n)
n - Position of token.
IndexOutOfBoundsException - If the position is less than 0
or greater than the number of tokens.public int tokenStart(int n)
n - Position of token.
IndexOutOfBoundsException - If the position is less than 0 or
greater than or equal to the number of tokens.public int tokenEnd(int n)
n - Position of token.
IndexOutOfBoundsException - If the position is less than 0 or
greater than or equal to the number of tokens.public String[] tokens()
The array is copied from the underlying list of tokens, so modifying it will not affect this tokenization.
public String[] whitespaces()
The array is copied from the underlying list of tokens, so modifying it will not affect this tokenization.
public List<String> tokenList()
public List<String> whitespaceList()
public boolean equals(Object that)
true if the specified object is a tokenization
that is equal to this one. Equality is defined as having the
same text, tokens, whitespaces, and token start and end positions.
equals in class Objectpublic int hashCode()
hashCode in class Object
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||