![]() |
VOOZH | about |
External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted does not fit into the main memory of a computing device (usually RAM) and instead, must reside in the slower external memory (usually a hard drive).
External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in the main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted sub-files are combined into a single larger file.
Example:
The external merge sort algorithm, which sorts chunks that each fit in RAM, then merges the sorted chunks together. We first divide the file into runs such that the size of a run is small enough to fit into the main memory. Then sort each run in the main memory using the merge sort sorting algorithm. Finally merge the resulting runs together into successively bigger runs, until the file is sorted.
Examples:
Prerequisites: MergeSort, Merge K Sorted Arrays:
Input:
input_file: Name of input file. input.txt
output_file: Name of output file, output.txt
run_size: Size of a run (can fit in RAM)
num_ways: Number of runs to be merged
To solve the problem follow the below idea:
The idea is straightforward, All the elements cannot be sorted at once as the size is very large. So the data is divided into chunks and then sorted using merge sort. The sorted data is then dumped into files. As such a huge amount of data cannot be handled altogether. Now After sorting the individual chunks. Sort the whole array by using the idea of merging k sorted arrays.
Follow the below steps to solve the problem:
Below is the implementation of the above approach.
Time Complexity: O(N * log N).
Auxiliary space: O(run_size). run_size is the space needed to store the array.
Note: This code won't work on an online compiler as it requires file creation permissions. When running in a local machine, it produces a sample input file "input.txt" with 10000 random numbers. It sorts the numbers and puts the sorted numbers in a file "output.txt". It also generates files with names 1, 2, .. to store sorted runs.