|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectcom.aliasi.spell.EditDistance
public class EditDistance
The EditDistance class implements the standard notion
of edit distance, with or without transposition. The distance
without transposition is known as Levenshtein distance, and
with transposition as Damerau-Levenstein distance (see below about
distance-like metric properties).
The edit distance between two strings is defined to be the minimum number of non-matching substring edits that is required to turn one string into the other. The available edits and their corresponding input and output substrings are:
Examples of minimal edit sequences are as follows, with distances as indicated
Operation Input Output Notation Match "a" "a" m(a) Insert "" "a" i(a) Delete "a" "" d(a) Substitute "a" "b" s(b,a) Transpose "ab" "ba" t(ab)
Input
Output
Edit Sequence (w. Transp)
Dist (w. Transp)
it it m(i) m(t) 0
gage gauge m(g) m(a) i(u) m(g) m(e) 1
thhe the m(t) d(h) m(h) m(e) 1
tenser tensor m(t) m(e) m(n) m(s) s(o,e) m(r) 1
hte the d(h) m(t) i(h) m(e) [t(ht) m(e)] 2 [1]
htne then d(h) m(t) s(h,n) m(e) i(n) [t(ht) t(ne)] 3 [2]
pwn4g ownage s(o,p) m(w) m(n) s(a,4) m(g) i(e) 3
Note that, in general, there will be more than one way to edit a
string into another string. For instance, a delete and insert may
replace a transposition, so that "hte" becomes "the" through edits
"d(h) m(t) i(h) m(e)" or "i(t) m(h) d(t) m(e)", as well as many
many more, such as "d(h) d(t) d(e) i(t) i(h) i(e)".
Edit distance without transposition defines a proper metric.
Recall that a distance measure d(x,y) forms a metric
if for all x, y, z, we have (1) d(x,y) >=
0, (2) d(x,y) = d(y,x), (3) d(x,x) =
0, and (4) d(x,y) + d(y,z) >= d(x,z). All
of these properties are easy to verify. But with transposition, we
have strings such as AB, BA and
ACB. With transposition, d(AB,BA)=1,
d(BA,BCA)=1, but d(AB,BCA)= 3 >= d(AB,BA) +
d(BA,BCA) = 1 + 1 = 2.
Implementation Note: This class implements edit distance
using dynamic programming in time O(n*m) where
n and m are the length of the sequences
being compared. Using a sliding window of three lattice slices rather
than allocating the entire lattice at once, the space required is
that for three arrays of integers as long as the shorter of the two
character sequences being compared. For details, see section
12.1.1 of:
| Field Summary | |
|---|---|
static Distance<CharSequence> |
NON_TRANSPOSING
Edit distance disallowing transposition. |
static Distance<CharSequence> |
TRANSPOSING
Edit distance allowing transposition. |
| Constructor Summary | |
|---|---|
EditDistance(boolean allowTransposition)
Construct an edit distance with or without transposition based on the specified flag. |
|
| Method Summary | |
|---|---|
double |
distance(CharSequence cSeq1,
CharSequence cSeq2)
Returns the edit distance between the specified character sequences. |
static int |
editDistance(CharSequence cSeq1,
CharSequence cSeq2,
boolean allowTransposition)
Returns the edit distance between the character sequences with or without transpositions as specified. |
double |
proximity(CharSequence cSeq1,
CharSequence cSeq2)
Returns the proximity between the character sequences. |
String |
toString()
Returns a string representation of this edit distance. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final Distance<CharSequence> TRANSPOSING
public static final Distance<CharSequence> NON_TRANSPOSING
| Constructor Detail |
|---|
public EditDistance(boolean allowTransposition)
allowTransposition - Set to true to allow
transposition edits in the constructed distance.| Method Detail |
|---|
public double distance(CharSequence cSeq1,
CharSequence cSeq2)
distance in interface Distance<CharSequence>cSeq1 - First character sequence.cSeq2 - Second character sequence.
public double proximity(CharSequence cSeq1,
CharSequence cSeq2)
and thus proximities will all be negative or zero.proximity(cs1,cs2) = -distance(cs1,cs2)
proximity in interface Proximity<CharSequence>cSeq1 - First character sequence.cSeq2 - Second character sequence.
public String toString()
toString in class Object
public static int editDistance(CharSequence cSeq1,
CharSequence cSeq2,
boolean allowTransposition)
cSeq1 - First character sequence.cSeq2 - Second character sequence.allowTransposition - Set to true to allow
transposition edits.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||