VOOZH about

URL: https://www.geeksforgeeks.org/dsa/optimal-file-merge-patterns/

⇱ Optimal File Merge Patterns - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Optimal File Merge Patterns

Last Updated : 11 Jul, 2025

Given n number of sorted files, the task is to find the minimum computations done to reach the Optimal Merge Pattern. 
When two or more sorted files are to be merged altogether to form a single file, the minimum computations are done to reach this file are known as Optimal Merge Pattern.

If more than 2 files need to be merged then it can be done in pairs. For example, if need to merge 4 files A, B, C, D. First Merge A with B to get X1, merge X1 with C to get X2, merge X2 with D to get X3 as the output file.

If we have two files of sizes m and n, the total computation time will be m+n. Here, we use the greedy strategy by merging the two smallest size files among all the files present.

Examples: 
Given 3 files with sizes 2, 3, 4 units. Find an optimal way to combine these files 

Input: n = 3, size = {2, 3, 4} 
Output: 14 
Explanation: There are different ways to combine these files: 
Method 1: Optimal method 
 

👁 Image


Method 2: 
 

👁 Image


Method 3: 
 

👁 Image


Input: n = 6, size = {2, 3, 4, 5, 6, 7} 
Output: 68 
Explanation: Optimal way to combine these files 
 

👁 Image


Input: n = 5, size = {5,10,20,30,30} 
Output: 205 

Input: n = 5, size = {8,8,8,8,8} 
Output: 96 

Observations:

From the above results, we may conclude that for finding the minimum cost of computation we need to have our array always sorted, i.e., add the minimum possible computation cost and remove the files from the array. We can achieve this optimally using a min-heap(priority-queue) data structure.
Approach: 
 

Node represents a file with a given size also given nodes are greater than 2 

  1. Add all the nodes in a priority queue (Min Heap).{pq.poll = file size}
  2. Initialize count = 0 // variable to store file computations.
  3. Repeat while (size of priority Queue is greater than 1) 
    1. int weight = pq.poll(); pq.pop;//pq denotes priority queue, remove 1st smallest and pop(remove) it out
    2. weight+=pq.poll()  && pq.pop(); // add the second element and then pop(remove) it out
    3. count +=weight;
    4. pq.add(weight) // add this combined cost to priority queue;  
  4. count is the final answer

Below is the implementation of the above approach:  


Output
Minimum Computations = 68

Time Complexity: O(nlogn)
Auxiliary Space: O(n)

Comment
Article Tags:
Article Tags: