VOOZH about

URL: https://www.geeksforgeeks.org/python/jaccard-similarity/

⇱ Jaccard Similarity - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Jaccard Similarity

Last Updated : 23 Jul, 2025

Measuring similarity between datasets is a fundamental problem in many fields, such as natural language processing, machine learning, and recommendation systems. One of the simplest and most effective similarity measures is Jaccard similarity, which quantifies how much two sets overlap.

Jaccard similarity, also known as the Jaccard index or Jaccard coefficient, is a measure of similarity between two sets. It is defined as the ratio of the intersection of the sets to their union:

where:

  • ∣A∩B∣ is the number of common elements between sets
  • ∣A∪B∣ is the total number of unique elements in both sets.

The value of Jaccard similarity ranges from 0 to 1:

  • J(A,B)=1 → The sets are identical.
  • J(A,B)=0 → The sets have no common elements.
👁 Jaccard-similarity
Computing similarity between two objects using Jaccard similarity

Jaccard Similarity Between Two Binary Vectors

Formula for Binary Vectors

For two binary vectors, Jaccard similarity is computed as:

where:

  • M11 → Number of positions where both vectors have 1.
  • M10 → Positions where A has 1 and B has 0.
  • M01 → Positions where A has 0 and B has 1.

Numerical Example

Consider the binary vectors:

A = [1, 1, 0, 1, 0, 1, 0]
B = [1, 0, 0, 1, 1, 1, 0]

Step-by-step calculations:

  • M11 = 3 (positions: 1st, 4th, 6th)
  • M10 = 1 (position: 2nd)
  • M01 = 1 (position: 5th)

Python Implementation with Visualization

Output:

👁 Screenshot-from-2025-03-08-01-09-44

Jaccard Similarity Between Two Sets

Formula for Sets

For two sets 𝐴 and 𝐵, the Jaccard similarity is:

Numerical Example

Consider the binary vectors:

A = {1, 2, 3, 4, 5}
𝐵 = {3, 4, 5, 6, 7}

Step-by-step calculations:

  • A∩B={3,4,5} → Intersection (common elements)
  • A∪B={1,2,3,4,5,6,7} → Union (total unique elements)

Python Implementation with Visualization

Output:

👁 Screenshot-from-2025-03-08-01-14-07

Real-Life Applications of Jaccard Similarity

1. Text Similarity & Plagiarism Detection: Jaccard similarity is used to compare sets of words or n-grams in two documents. A high similarity score may indicate plagiarism or duplicate content.

2. Recommendation Systems: In collaborative filtering, Jaccard similarity is used to find users with similar preferences by comparing their liked items.

3. Image Processing: In object detection, Jaccard similarity (also called Intersection over Union (IoU)) measures how much a detected object overlaps with the ground truth.

4. Genomic Data Comparison: Jaccard similarity helps compare DNA or protein sequences in bioinformatics.

Related article

How to calculate Jaccard Similarity in R
How to calculate Jaccard Similarity in Python
Cosine similarity

Comment
Article Tags:
Article Tags: