com.aliasi.tokenizer
Class PorterStemmerTokenizerFactory

java.lang.Object
  extended by com.aliasi.tokenizer.ModifiedTokenizerFactory
      extended by com.aliasi.tokenizer.ModifyTokenTokenizerFactory
          extended by com.aliasi.tokenizer.PorterStemmerTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Serializable

public class PorterStemmerTokenizerFactory
extends ModifyTokenTokenizerFactory
implements Serializable

A PorterStemmerTokenizerFactory applies Porter's stemmer to the tokenizers produced by a base tokenizer factory.

Porter's stemmer computes an approximation of converting words to their morphological base form. This class provides a single top-level static method, stem(String), which returns a stemmed form of an input string.

Serialization

A Porter stemming tokenizer factory is serializable if its base tokenizer factory is serializable.

Thread Safety

A Porter stemming tokenizer factory is thread safe if its base tokenizer factory is thread safe.

Implementation

The underlying stemming code is Martin Porter's own public domain Java port of his original C implementation of stemming. More information can be found at:

Porter Stemmer Home Page

References

The original paper describing Porter's stemmer is:

Porter, Martin. 1980. An algorithm for suffix stripping. Program. 14:3. 130--137.

Since:
Lingpipe3.8
Version:
3.8
Author:
Bob Carpenter
See Also:
Serialized Form

Constructor Summary
PorterStemmerTokenizerFactory(TokenizerFactory factory)
          Construct a tokenizer factory that applies Porter stemming to the tokenizers produced by the specified base factory.
 
Method Summary
 String modifyToken(String token)
          Returns the Porter stemmed version of the specified token.
static String stem(String in)
          Return the stem of the specified input string using the Porter stemmer.
 
Methods inherited from class com.aliasi.tokenizer.ModifyTokenTokenizerFactory
modify, modifyWhitespace
 
Methods inherited from class com.aliasi.tokenizer.ModifiedTokenizerFactory
baseTokenizerFactory, tokenizer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PorterStemmerTokenizerFactory

public PorterStemmerTokenizerFactory(TokenizerFactory factory)
Construct a tokenizer factory that applies Porter stemming to the tokenizers produced by the specified base factory.

Parameters:
factory - Base tokenizer factory.
Method Detail

modifyToken

public String modifyToken(String token)
Returns the Porter stemmed version of the specified token.

Overrides:
modifyToken in class ModifyTokenTokenizerFactory
Parameters:
token - Token to stem.
Returns:
Stemmed version of token.

stem

public static String stem(String in)
Return the stem of the specified input string using the Porter stemmer.

Parameters:
in - String to stem.
Returns:
Stem of the specified string.