|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.corpus.ChunkHandlerAdapter
TagChunkCodecAdapters.taggingToChunking(TagChunkCodec,ObjectHandler)
instead.
@Deprecated public class ChunkHandlerAdapter
A ChunkHandlerAdapter converts a BIO-coded tag handler
to a chunk handler. The adapter handles chunkings by tokenizing
their character sequences and then using their chunk sets to
produce tags in the begin-in-out (BIO) tagging scheme. For an
adapter from a chunk handler to a BIO-coded tag handler, see the
sister class ChunkTagHandlerAdapter.
The BIO tagging scheme marks each token as either beginning a chunk (B), contininuing a chunk (I), or not in a chunk (O). For example, consider the following string (with character indices annotated below it):
with chunks of typeJohn J. Smith lives in Washington. 0123456789012345678901234567890123 0 1 2 3
PERSON spanning from character 0
(inclusive) to 13 (exclusive) and a chunk of type
LOCATION spanning from 23 to 33. With the standard
tokenizerIndoEuropeanTokenizerFactory providing
tokenization, the tokens, whitespaces and their associated BIO tags
are:
As usual, the whitespaces with the same index as a token occur before it. Thus the two periods in the input do not have spaces before them, but all other tokens do. Further note there is one additional whitespace following the last tag. The tag
Index Whitespace Token Tag 0 "" John B-PERSON1 " " J I-PERSON2 "" . I-PERSON3 " " Smith I-PERSON4 " " lives O5 " " in O6 " " Washington B-PERSON7 "" . O8 "" n/a
B-PERSON is assigned to the first token of the chunk,
with the subsequent tokens being assigned I-PERSON.
The tag "out" tag O is assigned to each
token that is not a substring of a chunk, including the final
period.
In order for this adaptation to be faithful, the chunks must be
consistent with the tokenizer. Specifically, each chunk must start
on the first character of a token and end on the last character of
a token. If the person chunk ended at character 14 (exclusive) to
include the space after the token Smith, it would no
longer be consistent with the tokenizer. In the constructor or
using the flag setting method setValidateTokenizer(boolean), the adapter may be configured to
raise exceptions if called upon to handle a chunking inconsistent
with its tokenizer. The static method consistentTokens(String[],String[],TokenizerFactory) is also
provided to test if a given set of tokens and whitespaces is
consistent with a tokenizer factory.
| Constructor Summary | |
|---|---|
ChunkHandlerAdapter(TagHandler tagHandler,
TokenizerFactory tokenizerFactory,
boolean validateTokenizer)
Deprecated. See class documentation. |
|
ChunkHandlerAdapter(TokenizerFactory tokenizerFactory,
boolean validateTokenizer)
Deprecated. Construct a chunk handler based on the specified tokenizer factory and an initially null tag handler. |
|
| Method Summary | |
|---|---|
static boolean |
consistentTokens(String[] toks,
String[] whitespaces,
TokenizerFactory tokenizerFactory)
Deprecated. Returns true if the specified tokens and
whitespaces are consistent with the specified tokenizer
factory. |
void |
handle(Chunking chunking)
Deprecated. Handle the specified chunking by converting it to a tagging using the BIO scheme and contained tokenizer, then delegating to the contained tag handler. |
void |
setTagHandler(TagHandler tagHandler)
Deprecated. See class documentation. |
void |
setValidateTokenizer(boolean validateTokenizer)
Deprecated. Sets the tokenizer validation status to the specified value. |
static String[] |
toTags(Chunking chunking,
TokenizerFactory factory)
Deprecated. Returns the array of tags for the specified chunking, relative to the specified tokenizer factory. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
@Deprecated
public ChunkHandlerAdapter(TagHandler tagHandler,
TokenizerFactory tokenizerFactory,
boolean validateTokenizer)
setTagHandler(TagHandler). The chunks handled
by this handler will be converted to BIO-encoded tag sequences
tagHandler - Tag handler.tokenizerFactory - Tokenizer factory.validateTokenizer - Whether or not to validate tokenizer.
public ChunkHandlerAdapter(TokenizerFactory tokenizerFactory,
boolean validateTokenizer)
setTagHandler(TagHandler).
tokenizerFactory - Tokenizer factory.validateTokenizer - Whether or not to validate tokenizer.| Method Detail |
|---|
@Deprecated public void setTagHandler(TagHandler tagHandler)
tagHandler - New tag handler for this class.public void setValidateTokenizer(boolean validateTokenizer)
true, then every chunking
is tested for whether or not it is consistent with the
specified tokenizer for this handler.
validateTokenizer - Whether or not to validate tokenizer.public void handle(Chunking chunking)
handle in interface ObjectHandler<Chunking>chunking - Chunking to handle.
IllegalArgumentException - If tokenizer consistency is
being validated and the tokenization is not consistent with the
specified chunking.
public static String[] toTags(Chunking chunking,
TokenizerFactory factory)
chunking - Chunking to convert to tags.factory - Tokenizer factory for token generation.
public static boolean consistentTokens(String[] toks,
String[] whitespaces,
TokenizerFactory tokenizerFactory)
true if the specified tokens and
whitespaces are consistent with the specified tokenizer
factory. A tokenizer is consistent with the specified
tokens and whitespaces if running the tokenizer over
the concatenation of the tokens and whitespaces produces
the same tokens and whitespaces.
toks - Tokens to check.whitespaces - Whitespaces to check.tokenizerFactory - Factory to create tokenizers.
true if the tokenizer is consistent with
the tokens and whitespaces.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||