com.aliasi.chunk
Interface TagChunkCodec

All Known Implementing Classes:
BioTagChunkCodec, IoTagChunkCodec

public interface TagChunkCodec

A TagChunkCodec provides a means of coding chunkings as taggings and decoding (string) taggings back to chunkings.

Each codec contains a method tagSet(Set) to return the complete set of tags used in the coding given a set of chunk types. Codecs also use a variable argument method legalTags(String[]) to determine if a sequence of tags is legal. For a known set of chunk types, the followers of a tag can be constructed by iterating over the set of tags returned by tagSet() and check if they're legal using legalTags().

To validate whether a chunking may be successfully encoded as a tagging and then decoded to the original chunking, use the method isEncodable(Chunking). To validate whether a string tagging may be successfully decoded to a chunking and then reencoded to the original string tagging, use isDecodable(StringTagging).

Since:
LingPipe3.9
Version:
3.9
Author:
Bob Carpenter

Method Summary
 boolean isDecodable(StringTagging tagging)
          Returns true if the specified tagging may be decoded as a chunking then encoded back to the original tagging accurately.
 boolean isEncodable(Chunking chunking)
          Returns true if the specified chunking may be encoded as a tagging then decoded back to the original chunking accurately.
 boolean legalTags(String... tags)
          Returns true if the specified sequence of tags is a complete legal tag sequence.
 boolean legalTagSubSequence(String... tags)
          Returns true if the specified sequence of tags is a legal subsequence of tags.
 Iterator<Chunk> nBestChunks(TagLattice<String> lattice, int[] tokenStarts, int[] tokenEnds, int maxResults)
          Returns an iterator over chunks extracted in order of highest probability up to the specified maximum number of results.
 Set<String> tagSet(Set<String> chunkTypes)
          Returns the complete set of tags used by this codec for the specified set of chunk types.
 Chunking toChunking(StringTagging tagging)
          Return the result of decoding the specified tagging into a chunking.
 StringTagging toStringTagging(Chunking chunking)
          Return the string tagging that fully encodes the specified chunking.
 Tagging<String> toTagging(Chunking chunking)
          Return the tagging that partially encodes the specified chunking.
 

Method Detail

toTagging

Tagging<String> toTagging(Chunking chunking)
Return the tagging that partially encodes the specified chunking. This method does not return the underlying character sequence or token positions -- that functionality is available from the method toStringTagging(Chunking).

This method will typically be more efficient than toStringTagging(), but implementations may just return the same value, because StringTagging extends Tagging<String>.

This method may be implemented by delegating to call to toStringTagging(Chunking), but a direct implementation is often more efficient.

Parameters:
chunking - Chunking to encode.
Returns:
Tagging that encodes the chunking.

toStringTagging

StringTagging toStringTagging(Chunking chunking)
Return the string tagging that fully encodes the specified chunking.

Parameters:
chunking - Chunking to encode.
Returns:
Tagging that encodes the chunking.

toChunking

Chunking toChunking(StringTagging tagging)
Return the result of decoding the specified tagging into a chunking.

Parameters:
tagging - Tagging to decode.
Returns:
Chunking resulting from tagging.
Throws:
IllegalArgumentException - If the tag sequence is illegal.

tagSet

Set<String> tagSet(Set<String> chunkTypes)
Returns the complete set of tags used by this codec for the specified set of chunk types.

Modifying the returned set will not affect the codec.

Parameters:
chunkTypes - Set of types for chunks.
Returns:
Set of all tags used to encode chunks of types in the specified set.

legalTags

boolean legalTags(String... tags)
Returns true if the specified sequence of tags is a complete legal tag sequence. The companion method legalTagSubSequence(String[]) tests if a substring of tags is legal.

Parameters:
tags - Variable length array of tags.
Returns:
true if the specified sequence of tags is a complete legal tag sequence.

legalTagSubSequence

boolean legalTagSubSequence(String... tags)
Returns true if the specified sequence of tags is a legal subsequence of tags. See the companion method legalTags(String[]) to test if a complete sequence is legal.

A sequence of tags is a legal subsequence if a legal sequence may be created by adding more tags to the front and/or end of the specified sequence.

Providing an empty sequence of tags always returns true. The result for a single input tag determines if the tag itself is legal. For longer sequences, the tags must all be legal and their order must be legal.

Parameters:
tags - Sequence of tags to test.
Returns:
true if the sequence of tags is legal as a subsequence of some larger sequence.

isEncodable

boolean isEncodable(Chunking chunking)
Returns true if the specified chunking may be encoded as a tagging then decoded back to the original chunking accurately.

Parameters:
chunking - Chunking to test.
Returns:
true if encoding then decoding produces the specified chunking.

isDecodable

boolean isDecodable(StringTagging tagging)
Returns true if the specified tagging may be decoded as a chunking then encoded back to the original tagging accurately.

Parameters:
tagging - Tagging to test.
Returns:
true if decoding then encoding produces the specified tagging.

nBestChunks

Iterator<Chunk> nBestChunks(TagLattice<String> lattice,
                            int[] tokenStarts,
                            int[] tokenEnds,
                            int maxResults)
Returns an iterator over chunks extracted in order of highest probability up to the specified maximum number of results.

Parameters:
lattice - Lattice from which chunks are extracted.
maxResults - Maximum number of chunks to return.
Returns:
Iterator over the chunks in the lattice in order from highest to lowest probability.