|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.tokenizer.IndoEuropeanTokenCategorizer
public final class IndoEuropeanTokenCategorizer
A IndoEuropeanTokenCategorizer is a generic token
categorizer for Indo-European languages that is based on character
"shape".
The token categories returned by categorize(String) are
as follows. To find the category for a given token, the first
category that matches in the following list is chosen.
| Category | Description |
NULL-TOK |
Zero-length string |
1-DIG |
A single digit. |
2-DIG |
A two-digit string. |
3-DIG |
A three digit string. |
4-DIG |
A four digit string. |
5+-DIG |
String of all digits five or more digits long. |
DIG-LET |
Contains digits and letters. |
DIG-- |
Contains digits and hyphens |
DIG-/ |
Contains digits and slashes. |
DIG-, |
Contains digits and commas. |
DIG-. |
Contains digits and periods. |
1-LET-UP |
A single uppercase letter. |
1-LET-LOW |
One lowercase letter |
LET-UP |
Uppercase letters only. |
LET-LOW |
Lowercase letters only. |
LET-CAP |
Uppercase letter followed by one or more lowercase letters. |
LET-MIX |
Letters only, containing both uppercase and lettercase. |
PUNC- |
A sequence of punctuation characters. |
OTHER |
Anything else. |
| Field Summary | |
|---|---|
static IndoEuropeanTokenCategorizer |
CATEGORIZER
This is a constant Indo-European token categorizer. |
| Constructor Summary | |
|---|---|
IndoEuropeanTokenCategorizer()
Construct an Indo-European token categorizer. |
|
| Method Summary | |
|---|---|
String[] |
categories()
Returns an array of strings representing all the categories produced by this categorizer. |
String |
categorize(String token)
Returns the type of a token, based on its structure or other information. |
void |
compileTo(ObjectOutput objOut)
Compiles this token categorizer to the specified object output. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final IndoEuropeanTokenCategorizer CATEGORIZER
| Constructor Detail |
|---|
public IndoEuropeanTokenCategorizer()
CATEGORIZER may be used in
place of any instance constructed through this constructor.
| Method Detail |
|---|
public String categorize(String token)
categorize in interface TokenCategorizertoken - Token whose class is returned.
public String[] categories()
categories in interface TokenCategorizer
public void compileTo(ObjectOutput objOut)
throws IOException
CATEGORIZER.
compileTo in interface CompilableobjOut - Object output to which this categorizer is
written.
IOException - If there is an underlying I/O exception
during the write.a
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||