|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tokenizer.RegExTokenizerFactory
public class RegExTokenizerFactory
A RegExTokenizerFactory creates a tokenizer factory
out of a regular expression. The regular expression is presented
as an instance of Pattern and matching is carried out with
the java.util.regex package. The pattern provided when the
factory is constructed is used to create instances of Matcher for use in tokenizers. The method Matcher.find(int) is called to find the next token in an input
sequence.
For instance, consider a regular expression which takes a token to be a sequence of alphabetic characters, a sequence of numeric characters, or a single non-alphanumeric character:
[a-zA-Z]+|[0-9]+|\S
This can be used to construct a tokenizer factory:
String regex = "[a-zA-Z]+|[0-9]+|\\S";
TokenizerFactory tf = new RegExTokenizerFactory(regex);
char[] cs = "abc de 123. ".toCharArray();
Tokenizer tokenizer = tf.tokenizer(cs,0,cs.length);
Note the escaping of the backslash character (\) in
the Java string regex with a backslash
(\), resulting in \\. For the regular
expression there are no spaces within any of the disjuncts because
the matched tokens should not contain whitespaces. Finally note
the use of Kleene plus (+) rather than Kleene star
(*) to ensure that tokens are at least a single
character long. In fact, the constructor will throw an exception
if the pattern matches the empty string.
The tokenizer above will return the following tokens, whitespaces and character offsets:
whitespaces: "", " ", " ", "", " "
tokens: "abc", "de", "123", "."
token starts: 0, 4, 7, 10
A regular-expression-based tokenizer factory is completely thread safe.
A regular-expression-based tokenizer factory may be serialized.
| Constructor Summary | |
|---|---|
RegExTokenizerFactory(Pattern pattern)
Construct a regular expression tokenizer factory with the specified pattern for matching. |
|
RegExTokenizerFactory(String regex)
Construct a regular expression tokenizer factory using the specified regular expression for matching. |
|
RegExTokenizerFactory(String regex,
int flags)
Construct a regular expression tokenizer factory using the specified regular expression for matching according to the specified flags. |
|
| Method Summary | |
|---|---|
void |
compileTo(ObjectOutput objOut)
Deprecated. Use the Serializable interface instead. |
Pattern |
pattern()
Returns the regular expression pattern backing this tokenizer factory. |
Tokenizer |
tokenizer(char[] cs,
int start,
int length)
Returns a tokenizer for the specified subsequence of characters. |
String |
toString()
Return a description of this regex-based tokenizer factory including its pattern's regular expression and flags. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public RegExTokenizerFactory(String regex)
regex - The regular expression.
PatternSyntaxException - If the expression's syntax is
invalid.
public RegExTokenizerFactory(String regex,
int flags)
|") of the
following flags: Pattern.CASE_INSENSITIVE, Pattern.MULTILINE, Pattern.DOTALL, Pattern.UNICODE_CASE and Pattern.CANON_EQ.
See Pattern.compile(String,int) for more information.
regex - The regular expression.flags - The match flags.
PatternSyntaxException - If the expression's syntax is
invalid.
IllegalArgumentException - If bit values other than those
corresponding to defined match flags are set in the flags.public RegExTokenizerFactory(Pattern pattern)
pattern - Pattern to use for matching.| Method Detail |
|---|
public Pattern pattern()
public Tokenizer tokenizer(char[] cs,
int start,
int length)
TokenizerFactory
tokenizer in interface TokenizerFactorycs - Characters to tokenize.start - Index of first character to tokenize.length - Number of characters to tokenize.
@Deprecated
public void compileTo(ObjectOutput objOut)
throws IOException
Serializable interface instead.
Compilable
compileTo in interface CompilableobjOut - Object output to which this object is compiled.
IOException - If there is an I/O error compiling the
object.public String toString()
toString in class Object
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||