|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tokenizer.Tokenizer
public abstract class Tokenizer
Abstract base class for tokenizers. Acts as an iterator over both
space and token streams. The next space is returned through nextWhitespace(), and the next token through nextToken(). Some tokenizers may implement lastTokenStartPosition(), which returns the offset of the
previous token's first character in an underlying character stream.
The entire underlying character sequence may be reconstructed by
alternating the next whitespace and next token, beginning with the
first whitespace, until the end of both are reached. Offsets
returned by lastTokenStartPosition() are not guaranteed to
be into this sequence of characters.
Concrete subclasses must implement nextToken() to
return the next token. They may override nextWhitespace()
to return the next space string; it is implemented in this class to
return a single space Strings.SINGLE_SPACE_STRING.
Subclasses may also implement lastTokenStartPosition(),
which otherwise will throw an
UnsupportedOperationException.
| Constructor Summary | |
|---|---|
protected |
Tokenizer()
Construct a tokenizer. |
| Method Summary | |
|---|---|
Iterator<String> |
iterator()
Returns an iterator over the tokens remaining in this tokenizer. |
int |
lastTokenStartPosition()
Returns the offset of the first character of the most recently returned token (optional operation). |
abstract String |
nextToken()
Returns the next token in the stream, or null if
there are no more tokens. |
String |
nextWhitespace()
Returns the next whitespace. |
String[] |
tokenize()
Returns the remaining tokens in an array of strings. |
void |
tokenize(List<? super String> tokens,
List<? super String> whitespaces)
Adds the remaining tokens and whitespaces to the specified lists. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
protected Tokenizer()
| Method Detail |
|---|
public Iterator<String> iterator()
The returned iterator is not thread safe with respect to the
underlying tokenizer. Specifically, it maintains a handle to
this tokenizer. Calls to the iterators hasNext() and
nextToken() methods call this tokenizers
nextToken() method.
iterator in interface Iterable<String>public abstract String nextToken()
null if
there are no more tokens. Flushes any whitespace that has
not been returned.
null if there are no
more tokens.public String nextWhitespace()
nextToken.
Default implementation in this class is to return
a single space, Strings.SINGLE_SPACE_STRING.
public int lastTokenStartPosition()
-1 if no token has been returned yet.
The implementation here simply throws an unsupported operation exception. Subclasses should override this method if they support character offset indexing.
UnsupportedOperationException - If this method is not
supported.
public void tokenize(List<? super String> tokens,
List<? super String> whitespaces)
tokens - List to which tokens are added.whitespaces - List to which whitespaces are added.public String[] tokenize()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||