

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object com.aliasi.spell.EditDistance
public class EditDistance
The EditDistance
class implements the standard notion
of edit distance, with or without transposition. The distance
without transposition is known as Levenshtein distance, and
with transposition as DamerauLevenstein distance (see below about
distancelike metric properties).
The edit distance between two strings is defined to be the minimum number of nonmatching substring edits that is required to turn one string into the other. The available edits and their corresponding input and output substrings are:
Examples of minimal edit sequences are as follows, with distances as indicated
Operation Input Output Notation Match "a" "a" m(a) Insert "" "a" i(a) Delete "a" "" d(a) Substitute "a" "b" s(b,a) Transpose "ab" "ba" t(ab)
Input
Output
Edit Sequence (w. Transp)
Dist (w. Transp)
it it m(i) m(t) 0
gage gauge m(g) m(a) i(u) m(g) m(e) 1
thhe the m(t) d(h) m(h) m(e) 1
tenser tensor m(t) m(e) m(n) m(s) s(o,e) m(r) 1
hte the d(h) m(t) i(h) m(e) [t(ht) m(e)] 2 [1]
htne then d(h) m(t) s(h,n) m(e) i(n) [t(ht) t(ne)] 3 [2]
pwn4g ownage s(o,p) m(w) m(n) s(a,4) m(g) i(e) 3
Note that, in general, there will be more than one way to edit a
string into another string. For instance, a delete and insert may
replace a transposition, so that "hte" becomes "the" through edits
"d(h) m(t) i(h) m(e)" or "i(t) m(h) d(t) m(e)", as well as many
many more, such as "d(h) d(t) d(e) i(t) i(h) i(e)".
Edit distance without transposition defines a proper metric.
Recall that a distance measure d(x,y)
forms a metric
if for all x, y, z
, we have (1) d(x,y) >=
0
, (2) d(x,y) = d(y,x)
, (3) d(x,x) =
0
, and (4) d(x,y) + d(y,z) >= d(x,z)
. All
of these properties are easy to verify. But with transposition, we
have strings such as AB
, BA
and
ACB
. With transposition, d(AB,BA)=1
,
d(BA,BCA)=1
, but d(AB,BCA)= 3 >= d(AB,BA) +
d(BA,BCA) = 1 + 1 = 2
.
Implementation Note: This class implements edit distance
using dynamic programming in time O(n*m)
where
n
and m
are the length of the sequences
being compared. Using a sliding window of three lattice slices rather
than allocating the entire lattice at once, the space required is
that for three arrays of integers as long as the shorter of the two
character sequences being compared. For details, see section
12.1.1 of:
Field Summary  

static Distance<CharSequence> 
NON_TRANSPOSING
Edit distance disallowing transposition. 
static Distance<CharSequence> 
TRANSPOSING
Edit distance allowing transposition. 
Constructor Summary  

EditDistance(boolean allowTransposition)
Construct an edit distance with or without transposition based on the specified flag. 
Method Summary  

double 
distance(CharSequence cSeq1,
CharSequence cSeq2)
Returns the edit distance between the specified character sequences. 
static int 
editDistance(CharSequence cSeq1,
CharSequence cSeq2,
boolean allowTransposition)
Returns the edit distance between the character sequences with or without transpositions as specified. 
double 
proximity(CharSequence cSeq1,
CharSequence cSeq2)
Returns the proximity between the character sequences. 
String 
toString()
Returns a string representation of this edit distance. 
Methods inherited from class java.lang.Object 

clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait 
Field Detail 

public static final Distance<CharSequence> TRANSPOSING
public static final Distance<CharSequence> NON_TRANSPOSING
Constructor Detail 

public EditDistance(boolean allowTransposition)
allowTransposition
 Set to true
to allow
transposition edits in the constructed distance.Method Detail 

public double distance(CharSequence cSeq1, CharSequence cSeq2)
distance
in interface Distance<CharSequence>
cSeq1
 First character sequence.cSeq2
 Second character sequence.
public double proximity(CharSequence cSeq1, CharSequence cSeq2)
and thus proximities will all be negative or zero.proximity(cs1,cs2) = distance(cs1,cs2)
proximity
in interface Proximity<CharSequence>
cSeq1
 First character sequence.cSeq2
 Second character sequence.
public String toString()
toString
in class Object
public static int editDistance(CharSequence cSeq1, CharSequence cSeq2, boolean allowTransposition)
cSeq1
 First character sequence.cSeq2
 Second character sequence.allowTransposition
 Set to true
to allow
transposition edits.


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 