com.aliasi.chunk
Interface Chunk

All Superinterfaces:
Scored

public interface Chunk
extends Scored

The Chunk interface specifies a slice of a character sequence, a chunk type and a chunk score. It is important to keep in mind that a chunk only stores character offsets into a character sequence, not the character sequence itself. A chunk is almost always associated with a Chunking consisting of a character sequence and set of chunks over that sequence.

Equality for chunks is defined by the equality of the chunk's components (see the method documentation of equals(Object) for details). Hash codes are defined to be consistent with equality (see the method documentation of hashCode() for details).

Chunks may be constructed using static methods in the ChunkFactory class or they may be implemented directly.

The chunk interface extends the Scored interface, so chunks may be ordered by the Scored.SCORE_COMPARATOR and Scored.REVERSE_SCORE_COMPARATOR comparators. Note that these comparators are not consistent with equality, but may be used for sorting chunks in score order in arrays.

Chunks may be ordered by their offsets using TEXT_ORDER_COMPARATOR. Ordering the chunks of a given chunking using this comparator produces an ordering based on first appearance (and length in the case of ties). An alternative ordering is LONGEST_MATCH_ORDER_COMPARATOR.

Since:
LingPipe2.1
Version:
3.0
Author:
Bob Carpenter

Field Summary
static Comparator<Chunk> LONGEST_MATCH_ORDER_COMPARATOR
          Compares two chunks based on their text position.
static Comparator<Chunk> TEXT_ORDER_COMPARATOR
          Compares two chunks based on their text position.
 
Fields inherited from interface com.aliasi.util.Scored
REVERSE_SCORE_COMPARATOR, SCORE_COMPARATOR
 
Method Summary
 int end()
          Returns the index of one past the last character in this chunk.
 boolean equals(Object that)
          Returns true if the specified object is a chunk that is equal to this chunk.
 int hashCode()
          Returns this chunk's hash code.
 double score()
          Returns the score of this chunk.
 int start()
          Returns the index of the first character in this chunk.
 String type()
          Returns the type of this chunk.
 

Field Detail

TEXT_ORDER_COMPARATOR

static final Comparator<Chunk> TEXT_ORDER_COMPARATOR
Compares two chunks based on their text position. A chunk is greater if it starts later than another chunk, or if it starts at the same position and ends later. This comparator is not compatible with equals(Object), but may be used for sorting using Arrays.sort(Object[],Comparator).


LONGEST_MATCH_ORDER_COMPARATOR

static final Comparator<Chunk> LONGEST_MATCH_ORDER_COMPARATOR
Compares two chunks based on their text position. A chunk is greater if it starts later than another chunk, or if it starts at the same position and ends earlier. A chunk is also greater if it starts and ends at the same point and has a higher score. If start, end and scores are the same, the types are compared alphabetically.

This comparator is not compatible with equals(Object), but may be used for sorting using Arrays.sort(Object[],Comparator).

Method Detail

start

int start()
Returns the index of the first character in this chunk.

Returns:
The index of the first character in this chunk.

end

int end()
Returns the index of one past the last character in this chunk.

Returns:
The index of one past the last character in this chunk.

type

String type()
Returns the type of this chunk.

Returns:
The type of this chunk.

score

double score()
Returns the score of this chunk.

Specified by:
score in interface Scored
Returns:
The score of this chunk.

equals

boolean equals(Object that)
Returns true if the specified object is a chunk that is equal to this chunk. Another chunk is equal to this one if they have the same start, end, type and score.

Overrides:
equals in class Object
Parameters:
that - Object to compare to this chunk.
Returns:
true if the specified object is equal to this chunk.

hashCode

int hashCode()
Returns this chunk's hash code. A chunk's hash code depends on its start, end and type, but not its score:
hashCode() = start() + 31 * (end() + 31 * type().hashCode())

Overrides:
hashCode in class Object
Returns:
The hash code for this chunk.