String Comparison¶
These methods are all measures of the difference (aka edit distance) between two strings.
Levenshtein Distance¶

levenshtein_distance
(s1, s2)¶ Compute the Levenshtein distance between s1 and s2.
Levenshtein distance represents the number of insertions, deletions, and substitutions required to change one word to another.
For example: levenshtein_distance('berne', 'born') == 2
representing the transformation of the first e to o and the deletion of the second e.
See the Levenshtein distance article at Wikipedia for more details.
DamerauLevenshtein Distance¶

damerau_levenshtein_distance
(s1, s2)¶ Compute the DamerauLevenshtein distance between s1 and s2.
A modification of Levenshtein distance, DamerauLevenshtein distance counts transpositions (such as ifsh for fish) as a single edit.
Where levenshtein_distance('fish', 'ifsh') == 2
as it would require a deletion and an insertion,
though damerau_levenshtein_distance('fish', 'ifsh') == 1
as this counts as a transposition.
See the DamerauLevenshtein distance article at Wikipedia for more details.
Hamming Distance¶

hamming_distance
(s1, s2)¶ Compute the Hamming distance between s1 and s2.
Hamming distance is the measure of the number of characters that differ between two strings.
Typically Hamming distance is undefined when strings are of different length, but this implementation
considers extra characters as differing. For example hamming_distance('abc', 'abcd') == 1
.
See the Hamming distance article at Wikipedia for more details.
Jaro Similarity¶

jaro_similarity
(s1, s2)¶ Compute the Jaro similarity between s1 and s2.
Jaro distance is a stringedit distance that gives a floating point response in [0,1] where 0 represents two completely dissimilar strings and 1 represents identical strings.
Warning
Prior to 0.8.1 this function was named jaro_distance. That name is still available, but is no longer recommended. It will be replaced in 1.0 with a correct version.
JaroWinkler Similarity¶

jaro_winkler_similarity
(s1, s2)¶ Compute the JaroWinkler distance between s1 and s2.
JaroWinkler is a modification/improvement to Jaro distance, like Jaro it gives a floating point response in [0,1] where 0 represents two completely dissimilar strings and 1 represents identical strings.
Warning
Prior to 0.8.1 this function was named jaro_winkler. That name is still available, but is no longer recommended. It will be replaced in 1.0 with a correct version.
See the JaroWinkler distance article at Wikipedia for more details.
Match Rating Approach (comparison)¶

match_rating_comparison
(s1, s2)¶ Compare s1 and s2 using the match rating approach algorithm, returns
True
if strings are considered equivalent orFalse
if not. Can also returnNone
if s1 and s2 are not comparable (length differs by more than 3).
The Match rating approach algorithm is an algorithm for determining whether or not two names are
pronounced similarly. Strings are first encoded using match_rating_codex()
then compared according to the MRA algorithm.
See the Match Rating Approach article at Wikipedia for more details.