- All Implemented Interfaces:
- TokenizerFactory, Serializable
public class EnglishStopTokenizerFactory
- extends StopTokenizerFactory
- implements Serializable
EnglishStopTokenizerFactory applies an English stop
list to a contained base tokenizer factory.
The built-in stoplist consists of the following words:
a, be, had, it, only, she, was, about, because, has,
its, of, some, we, after, been, have, last, on, such, were, all,
but, he, more, one, than, when, also, by, her, most, or, that,
which, an, can, his, mr, other, the, who, any, co, if, mrs, out,
their, will, and, corp, in, ms, over, there, with, are, could, inc,
mz, s, they, would, as, for, into, no, so, this, up, at, from, is,
not, says, to
Note that the stoplist entries are all lowercase. Thus the input
should probably first be filtered by a
An English stop-listed tokenizer factory is thread safe if its
base tokenizer factory is thread safe.
EnglishStopTokenizerFactory is serializable if its
base tokenizer factory is serializable.
- Bob Carpenter
- See Also:
- Serialized Form
|Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public EnglishStopTokenizerFactory(TokenizerFactory factory)
- Construct an English stop tokenizer factory with the
specified base factory.
factory - Base tokenizer factory.