|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||
See:
Description
| Interface Summary | |
|---|---|
| TokenCategorizer | A TokenCategorizer supplies a string-based
category for string-based tokens. |
| TokenizerFactory | A TokenizerFactory constructors tokenizers from
subsequences of character arrays. |
| Class Summary | |
|---|---|
| CharacterTokenCategorizer | Returns a category for tokens made up out of a single character. |
| CharacterTokenizerFactory | A CharacterTokenizerFactory considers each
non-whitespace character in the input to be a distinct token. |
| EnglishStopListFilterTokenizer | Deprecated. Use EnglishStopTokenizerFactory instead. |
| EnglishStopTokenizerFactory | An EnglishStopTokenizerFactory applies an English stop
list to a contained base tokenizer factory. |
| FilterTokenizer | Deprecated. Use ModifiedTokenizerFactory instead. |
| IndoEuropeanTokenCategorizer | A IndoEuropeanTokenCategorizer is a generic token
categorizer for Indo-European languages that is based on character
"shape". |
| IndoEuropeanTokenizerFactory | An IndoEuropeanTokenizerFactory creates tokenizers
with built-in support for alpha-numerics, numbers, and other
common constructs in Indo-European langauges. |
| LengthStopFilterTokenizer | Deprecated. Use TokenLengthTokenizerFactory or ModifyTokenTokenizerFactory.modify(Tokenizer) instead. |
| LineTokenizerFactory | A LineTokenizerFactory treats each line of an input as
a token. |
| LowerCaseFilterTokenizer | Deprecated. Use LowerCaseTokenizerFactory instead. |
| LowerCaseTokenizerFactory | A LowerCaseTokenizerFactory filters the tokenizers produced
by a base tokenizer factory to produce lower case output. |
| ModifiedTokenizerFactory | A ModifiedTokenizerFactory is an abstract tokenizer factory
that modifies a tokenizer returned by a base tokenizer factory. |
| ModifyTokenTokenizerFactory | The abstract base class ModifyTokenTokenizerFactory
adapts token and whitespace modifiers to modify tokenizer
factories. |
| NGramTokenizerFactory | An NGramTokenizerFactory creates n-gram tokenizers
of a specified minimum and maximun length. |
| NormalizeWhiteSpaceFilterTokenizer | Deprecated. Use WhitespaceNormTokenizerFactory instead. |
| PorterStemmer | Deprecated. Use PorterStemmerTokenizerFactory.stem(String) instead. |
| PorterStemmerFilterTokenizer | Deprecated. Use PorterStemmerTokenizerFactory instead. |
| PorterStemmerTokenizerFactory | A PorterStemmerTokenizerFactory applies Porter's stemmer
to the tokenizers produced by a base tokenizer factory. |
| PunctuationStopListTokenizer | Deprecated. Use RegExFilteredTokenizerFactory with a
pattern matching the characters specified in Strings.allPunctuation(String). |
| RegExFilteredTokenizerFactory | A RegExFilteredTokenizerFactory modifies the tokens
returned by a base tokenizer factory's tokizer by removing
those that do not match a regular expression pattern. |
| RegExTokenizerFactory | A RegExTokenizerFactory creates a tokenizer factory
out of a regular expression. |
| SoundexFilterTokenizer | Deprecated. Use SoundexTokenizerFactory instead. |
| SoundexTokenizerFactory | A SoundexTokenizerFactory modifies the output of a base
tokenizer factory to produce tokens in soundex representation. |
| StopFilterTokenizer | Deprecated. Use ModifyTokenTokenizerFactory instead. |
| StopListFilterTokenizer | Deprecated. Use StopTokenizerFactory instead. |
| StopTokenizerFactory | A StopTokenizerFactory modifies a base tokenizer factory
by removing tokens in a specified stop set. |
| TokenChunker | A TokenChunker provides an implementationg of the Chunker interface based on an underlying tokenizer factory. |
| TokenFeatureExtractor | A TokenFeatureExtractor produces feature vectors from
character sequences representing token counts. |
| TokenFilterTokenizer | Deprecated. Use ModifyTokenTokenizerFactory instead. |
| Tokenization | A Tokenization represents the result of tokenizing a
string. |
| Tokenizer | The abstract class Tokenizer serves as a base for tokenizer
implementations, which provide streams of tokens, whitespaces,
and positions. |
| TokenLengthTokenizerFactory | A TokenLengthTokenizerFactory filters the tokenizers produced
by a base tokenizer to only return tokens between specified lower and
upper length limits. |
| WhitespaceNormTokenizerFactory | A WhitespaceNormTokenizerFactory filters the tokenizers produced
by a base tokenizer factory to convert non-empty whitespaces to a single
space and leave empty (zero-length) whitespaces alone. |
Classes for tokenizing character sequences.
|
|||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | ||||||||