VOOZH about

URL: https://www.geeksforgeeks.org/dsa/find-the-k-most-frequent-words-from-a-file/

⇱ K most Frequent Words in a File - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

K most Frequent Words in a File

Last Updated : 17 Jan, 2026

Given a book of words and an integer K. Assume you have enough main memory to accommodate all words. Design a dynamic data structure to find the top K most frequent words in a book. The structure should allow new words to be added in main memory.

Examples:

Input: fileData = "Welcome to the world of Geeks. This portal has been created to provide well written well thought and well explained solutions for selected questions If you like Geeks for Geeks and would like to contribute here is your chance You can write article and mail your article to contribute at geeksforgeeks org See your article appearing on the Geeks for Geeks main page and help thousands of other Geeks"
Output:
"your" : 3
"well" : 3
"and" : 4
"to" : 4
"Geeks" : 6

Using Hash Map and Heap

  • Store all words and their frequencies in a hash map.
  • Store top k frequent items in a min heap (Please refer k Largest Elements in an Array for details)
  • Print the words and their frequencies in the decreasing order of frequencies.

Important Points about Implementations

  • In Python, we have a direct function most_common()
  • In JavaScript, we do not have direct implementation of min heap, so we have used sorting.


Time Complexity : O(n + n Log k) where n is the number of words in the file. We assume that every word is of constant length.


Using Trie and Min Heap

The approach leverages a Trie to efficiently store and search words as they are read from the file, while simultaneously keeping track of each word's occurrence count. Each Trie node is enhanced with an additional field, indexMinHeap, which indicates the position of the word in the Min Heap if it is currently among the top k frequent words (or -1 if it is not). In parallel, a Min Heap of fixed size k is maintained to record the k most frequent words encountered so far. Each node in the Min Heap contains the word, its frequency, and a pointer to the corresponding Trie leaf node. As words are processed, the algorithm updates their frequencies in the Trie and then reflects these changes in the Min Heap by either updating an existing entry, inserting a new entry if space is available, or replacing the root of the Min Heap (which represents the least frequent word among the top k) when the new word’s frequency exceeds it.

Step-by-Step Process to Execute the Code

  • Open the input file and ensure it is accessible; report an error if the file cannot be opened.
  • Read words from the file one by one. For each word, insert it into the Trie: if the word already exists, increment its frequency counter; if not, create a new node and initialize its count to 1.
  • For every word inserted or updated in the Trie, update the Min Heap as follows:
    • If the word is already present in the Min Heap (i.e., its indexMinHeap is not -1), simply update its frequency in the heap and call minHeapify() at the respective index.
    • If the word is not present and the Min Heap has available space, insert the new word into the heap, update its corresponding Trie node's indexMinHeap, and rebuild the heap.
    • If the Min Heap is full, compare the frequency of the new word with the frequency at the root of the heap (the smallest frequency among the top k). If the new word’s frequency is lower, do nothing; if it is higher, replace the root with the new word, update the Trie node of the word being replaced (setting its indexMinHeap to -1), and call minHeapify() to restore the heap property.
  • After processing all words, the Min Heap will contain the k most frequent words. Finally, iterate over the Min Heap and print each word along with its frequency.

Below is given the implementation:


Output
your : 3
well : 3
and : 4
to : 4
Geeks : 6

The above output is for a file with following content. 

Welcome to the world of Geeks . This portal has been created to provide well written well thought and well explained solutions for selected questions If you like Geeks for Geeks and would like to contribute here is your chance You can write article and mail your article to contribute at geeksforgeeks org See your article appearing on the Geeks for Geeks main page and help thousands of other Geeks.

Comment
Article Tags: