com.aliasi.tokenizer
Class CharacterTokenCategorizer

java.lang.Object
  extended by com.aliasi.tokenizer.CharacterTokenCategorizer
All Implemented Interfaces:
TokenCategorizer

public class CharacterTokenCategorizer
extends Object
implements TokenCategorizer

Returns a category for tokens made up out of a single character. Possible categories are LETTER, DIGIT, PUNCTUATION, OTHER, and UNKNOWN. The latter class is for those tokens that are not single characters.

Since:
LingPipe1.0
Version:
3.9.1
Author:
Bob Carpenter

Field Summary
static String DIGIT_CAT
          The digit category.
static String LETTER_CAT
          The letter category.
static String OTHER_CAT
          The other category for non-digits, non-letters and non-punctuation tokens of a single character long.
static String PUNCTUATION_CAT
          The punctuation category.
static String UNKNOWN_CAT
          The unknown category for tokens not one character long.
 
Constructor Summary
CharacterTokenCategorizer()
          Deprecated. Use singleton instance INSTANCE instead.
 
Method Summary
 String[] categories()
          Returns a copy of the array of categories used by this categorizer.
 String categorize(String token)
          Returns the category of the specified token.
 String toString()
          Returns the name of this class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

UNKNOWN_CAT

public static final String UNKNOWN_CAT
The unknown category for tokens not one character long.

See Also:
Constant Field Values

DIGIT_CAT

public static final String DIGIT_CAT
The digit category.

See Also:
Constant Field Values

LETTER_CAT

public static final String LETTER_CAT
The letter category.

See Also:
Constant Field Values

PUNCTUATION_CAT

public static final String PUNCTUATION_CAT
The punctuation category.

See Also:
Constant Field Values

OTHER_CAT

public static final String OTHER_CAT
The other category for non-digits, non-letters and non-punctuation tokens of a single character long.

See Also:
Constant Field Values
Constructor Detail

CharacterTokenCategorizer

@Deprecated
public CharacterTokenCategorizer()
Deprecated. Use singleton instance INSTANCE instead.

Construct an instance of a character token categorizer.

Method Detail

categorize

public String categorize(String token)
Returns the category of the specified token. The result will be UNKNOWN for tokens that are not a single character long. A token that is a single digit will return DIGIT, a single letter LETTER, and punctuation PUNCTUATION. All other single-letter tokens will return OTHER.

Specified by:
categorize in interface TokenCategorizer
Parameters:
token - Token to categorize.
Returns:
Category of specified token.

categories

public String[] categories()
Returns a copy of the array of categories used by this categorizer.

Specified by:
categories in interface TokenCategorizer
Returns:
The array of categories used by this categorizer.

toString

public String toString()
Returns the name of this class.

Overrides:
toString in class Object
Returns:
The name of this class.