com.aliasi.coref.matchers
Class SequenceSubstringMatch

java.lang.Object
  extended by com.aliasi.coref.BooleanMatcherAdapter
      extended by com.aliasi.coref.matchers.SequenceSubstringMatch
All Implemented Interfaces:
Matcher

public final class SequenceSubstringMatch
extends BooleanMatcherAdapter

Implements a matching function that returns the score specified in the constructor if there is a token-wise match between the normal tokens of the mention and one of the mentions in the mention chain that is within a specified edit distance. Subclasses of this class may redefine the basic edit distances provided by deleteCost(String), insertCost(String), and substituteCost(String,String), which are defined in this class to be 1 in the case of insertion or deletion, and 0 for an exact substitution and 2 for a mismatch substitution.

Since:
LingPipe1.0
Version:
3.8
Author:
Bob Carpenter

Field Summary
 
Fields inherited from interface com.aliasi.coref.Matcher
MAX_DISTANCE_SCORE, MAX_SCORE, MAX_SEMANTIC_SCORE, NO_MATCH_SCORE
 
Constructor Summary
SequenceSubstringMatch(int score)
          Construct a sequence substring matcher that returns the specified score in the case of a match.
 
Method Summary
protected  int deleteCost(String token)
          Returns the cost to delete the specified token.
protected  int insertCost(String token)
          Returns the cost to insert the specified token.
 boolean matchBoolean(Mention mention, MentionChain chain)
          Returns true if the normal tokens in the mention are within a threshold edit distance of the normal tokens in one of the mentions in the chain.
protected  int substituteCost(String originalToken, String newToken)
          Returns the cost to substitute the new token for the original token.
 boolean withinEditDistance(String[] tokens1, String[] tokens2)
          Returns true if the specified arrays of tokens have an edit distance within the distance specified internally.
 boolean withinEditDistance(String[] tokens1, String[] tokens2, int maximumDistance)
          Returns true if the specified arrays of tokens are within the specified maximum distance, allowing for deletion, insertion and substitution costs as specified by deleteCost(String), insertCost(String), and substituteCost(String,String).
 
Methods inherited from class com.aliasi.coref.BooleanMatcherAdapter
match
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SequenceSubstringMatch

public SequenceSubstringMatch(int score)
Construct a sequence substring matcher that returns the specified score in the case of a match.

Parameters:
score - Score to return in the case of a match.
Method Detail

matchBoolean

public boolean matchBoolean(Mention mention,
                            MentionChain chain)
Returns true if the normal tokens in the mention are within a threshold edit distance of the normal tokens in one of the mentions in the chain.

Specified by:
matchBoolean in class BooleanMatcherAdapter
Parameters:
mention - Mention to test.
chain - Mention chain to test.
Returns:
true if there is a sequence substring match between the mention and chain.

withinEditDistance

public boolean withinEditDistance(String[] tokens1,
                                  String[] tokens2)
Returns true if the specified arrays of tokens have an edit distance within the distance specified internally.

Parameters:
tokens1 - First array of tokens to test.
tokens2 - Second array of tokens to test.
Returns:
true if the edit distance between the arrays of tokens is within the threshold.

withinEditDistance

public boolean withinEditDistance(String[] tokens1,
                                  String[] tokens2,
                                  int maximumDistance)
Returns true if the specified arrays of tokens are within the specified maximum distance, allowing for deletion, insertion and substitution costs as specified by deleteCost(String), insertCost(String), and substituteCost(String,String). To support pairs of tokens from different sets, as well as asymmetric primitive edit distances, insertions and deletions are separated, and substitution may be order sensitive. Deletions are from the first array of tokens, and insertions into the second array. Substitution costs will be computed with the first argument drawn from the first array of tokens and the second argument drawn from the second array.

Parameters:
tokens1 - First array of tokens to match.
tokens2 - Second array of tokens to match.
maximumDistance - Maximum edit distance allowed between token arrays.
Returns:
true if the edit distance between the arrays is less than or equal to the specified maximum distance.

deleteCost

protected int deleteCost(String token)
Returns the cost to delete the specified token.

Parameters:
token - Token to measure for deletion cost.
Returns:
Cost to delete the specified token.

insertCost

protected int insertCost(String token)
Returns the cost to insert the specified token.

Parameters:
token - Token to measure for insertion cost.
Returns:
Cost to insert the specified token.

substituteCost

protected int substituteCost(String originalToken,
                             String newToken)
Returns the cost to substitute the new token for the original token.

Parameters:
originalToken - Original token.
newToken - New token.
Returns:
Cost to substitute the new token for the original token.