com.aliasi.tokenizer
Class EnglishStopTokenizerFactory
java.lang.Object
com.aliasi.tokenizer.ModifiedTokenizerFactory
com.aliasi.tokenizer.ModifyTokenTokenizerFactory
com.aliasi.tokenizer.StopTokenizerFactory
com.aliasi.tokenizer.EnglishStopTokenizerFactory
- All Implemented Interfaces:
- TokenizerFactory, Serializable
public class EnglishStopTokenizerFactory
- extends StopTokenizerFactory
- implements Serializable
An EnglishStopTokenizerFactory applies an English stop
list to a contained base tokenizer factory.
The built-in stoplist consists of the following words:
a, be, had, it, only, she, was, about, because, has,
its, of, some, we, after, been, have, last, on, such, were, all,
but, he, more, one, than, when, also, by, her, most, or, that,
which, an, can, his, mr, other, the, who, any, co, if, mrs, out,
their, will, and, corp, in, ms, over, there, with, are, could, inc,
mz, s, they, would, as, for, into, no, so, this, up, at, from, is,
not, says, to
Note that the stoplist entries are all lowercase. Thus the input
should probably first be filtered by a LowerCaseTokenizerFactory.
Thread Safety
An English stop-listed tokenizer factory is thread safe if its
base tokenizer factory is thread safe.
Serialization
An EnglishStopTokenizerFactory is serializable if its
base tokenizer factory is serializable.
- Since:
- Lingpipe3.8
- Version:
- 3.8
- Author:
- Bob Carpenter
- See Also:
- Serialized Form
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
EnglishStopTokenizerFactory
public EnglishStopTokenizerFactory(TokenizerFactory factory)
- Construct an English stop tokenizer factory with the
specified base factory.
- Parameters:
factory - Base tokenizer factory.