VOOZH about

URL: https://www.geeksforgeeks.org/dsa/count-min-sketch-in-python/

⇱ Count-Min Sketch in Python - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Count-Min Sketch in Python

Last Updated : 23 Jul, 2025

Count-Min Sketch is a probabilistic data structure which approximates the frequency of items in a stream of data. It uses little memory while handling massive amounts of data and producing approximations of the answers. In this post, we'll explore the idea behind the Count-Min Sketch, how it's implemented in Python, and discuss its uses and drawbacks.

What is Count-Min Sketch?

Count-Min is a probabilistic data structure used to count unique items in a large stream of data. It is used to find an approximate frequency of the events on the streaming data.

The idea behind Count-Min Sketch is to use hash functions and a two-dimensional array (or matrix) to efficiently store the frequency of items. The array is made up of several rows and columns, where a bucket is represented by a column and a hash function by a row. The hash functions identify the locations in the array to increment or get counts when updating or querying the frequency of entries.

Key Operations in Count-Min Sketch:

Initialization: Set the number of rows and columns that you want in the Count-Min Sketch.

Update: To increase an element's count, hash it through each hash function and update the array's associated buckets.

Query: Find the lowest count across the related buckets after hashing an element with each hash algorithm to determine its estimated frequency.

Implementation of Count-Min Sketch in Python:

Below is the implementation of Count-Min Sketch in Python:


Output
Frequency of 'apple': 3

Count-Min Sketch is a data structure that provides accurate approximations of element frequencies in massive data streams. Python developers may efficiently address a range of frequency estimation issues by using Count-Min Sketch provided they have a thorough knowledge of its concepts, uses, and limits.

Comment
Article Tags: