Class EnglishStopTokenizerFactory

  extended by com.aliasi.tokenizer.ModifiedTokenizerFactory
      extended by com.aliasi.tokenizer.ModifyTokenTokenizerFactory
          extended by com.aliasi.tokenizer.StopTokenizerFactory
              extended by com.aliasi.tokenizer.EnglishStopTokenizerFactory
All Implemented Interfaces:
TokenizerFactory, Serializable

public class EnglishStopTokenizerFactory
extends StopTokenizerFactory
implements Serializable

An EnglishStopTokenizerFactory applies an English stop list to a contained base tokenizer factory.

The built-in stoplist consists of the following words:

a, be, had, it, only, she, was, about, because, has, its, of, some, we, after, been, have, last, on, such, were, all, but, he, more, one, than, when, also, by, her, most, or, that, which, an, can, his, mr, other, the, who, any, co, if, mrs, out, their, will, and, corp, in, ms, over, there, with, are, could, inc, mz, s, they, would, as, for, into, no, so, this, up, at, from, is, not, says, to
Note that the stoplist entries are all lowercase. Thus the input should probably first be filtered by a LowerCaseTokenizerFactory.

Thread Safety

An English stop-listed tokenizer factory is thread safe if its base tokenizer factory is thread safe.


An EnglishStopTokenizerFactory is serializable if its base tokenizer factory is serializable.

Bob Carpenter
See Also:
Serialized Form

Constructor Summary
EnglishStopTokenizerFactory(TokenizerFactory factory)
          Construct an English stop tokenizer factory with the specified base factory.
Method Summary
Methods inherited from class com.aliasi.tokenizer.StopTokenizerFactory
modifyToken, stopSet
Methods inherited from class com.aliasi.tokenizer.ModifyTokenTokenizerFactory
modify, modifyWhitespace
Methods inherited from class com.aliasi.tokenizer.ModifiedTokenizerFactory
baseTokenizerFactory, tokenizer
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public EnglishStopTokenizerFactory(TokenizerFactory factory)
Construct an English stop tokenizer factory with the specified base factory.

factory - Base tokenizer factory.