VOOZH about

URL: https://www.geeksforgeeks.org/dsa/jaro-and-jaro-winkler-similarity/

⇱ Jaro and Jaro-Winkler similarity - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Jaro and Jaro-Winkler similarity

Last Updated : 16 May, 2025

Jaro Similarity is the measure of similarity between two strings. The value of Jaro distance ranges from 0 to 1. where 1 means the strings are equal and 0 means no similarity between the two strings. 

Examples:

Input: s1 = "CRATE", s2 = "TRACE"; 
Output: Jaro Similarity = 0.733333


Input: s1 = "DwAyNE", s2 = "DuANE"; 
Output: Jaro Similarity = 0.822222

Algorithm:
The Jaro Similarity is calculated using the following formula 

where:

  • m is the number of matching characters
  • t is half the number of transpositions
  • where |s1| and |s2| are the lengths of strings s1 and s2 respectively.

The characters are said to be matching if they are the same and the characters are not further than 
Transpositions are half the number of matching characters in both strings but in a different order.
Calculation:

  • Let s1="arnab", s2="raanb", so the maximum distance to which each character is matched is 1.
  • It is evident that both the strings have 5 matching characters, but the order is not the same, so the number of characters that are not in order is 4, so the number of transpositions is 2.
  • Therefore, Jaro similarity can be calculated as follows: 
    Jaro Similarity = (1/3) * {(5/5) + (5/5) + (5-2)/5 } = 0.86667


Below is the implementation of the above approach.


Output
0.733333

Time Complexity: O(N * M), where N is the length of string s1 and M is the length of string s2.

Auxiliary Space: O(N + M)

The Jaro-Winkler similarity is a string metric measuring edit distance between two strings. Jaro - Winkler Similarity is much similar to Jaro Similarity. They both differ when the prefix of two string match. Jaro - Winkler Similarity uses a prefix scale 'p' which gives a more accurate answer when the strings have a common prefix up to a defined maximum length l. 
Examples:

Input: s1 = "DwAyNE", s2 = "DuANE"; 
Output: Jaro-Winkler Similarity =0.84


Input: s1="TRATE", s2="TRACE"; 
Output: Jaro-Winkler similarity = 0.906667

Calculation:

  • Jaro Winkler similarity is defined as follows 
    Sw = Sj + P * L * (1 - Sj)
    where, 
    • Sj, is jaro similarity
    • Sw, is jaro- winkler similarity
    • P is the scaling factor (0.1 by default)
    • L is the length of the matching prefix up to a maximum of 4 characters.
  • Let s1="arnab", s2="aranb". The Jaro similarity of the two strings is 0.933333 (From the above calculation.)
  • The length of the matching prefix is 2 and we take the scaling factor as 0.1.
  • Substituting in the formula; 
    Jaro-Winkler Similarity= 0.9333333 + 0.1 * 2 * (1-0.9333333) = 0.946667


Below is the implementation of the above approach. 


Output
Jaro-Winkler Similarity =0.906667

Time Complexity: O(N * M), where N is the length of string s1 and M is the length of string s2.
Auxiliary Space: O(N + M)

Comment