Merging Files

Sometimes it is necessary to combine two sorted files into one sorted file.  For example, with data stored on tape.  This is sometimes called an external sort, because it is not performed inside the computer’s memory.

Before this can be done it is essential that both source files share a common key (the field to be sorted on) and that both source files are already sorted.

Suppose we want to merge two source files, that are already sorted in ascending order, to produce a new destination file which is also sorted in ascending order, as shown below.

filemerge

We would normally begin by checking that the source files have some data inside them before proceeding.  Then the steps are as follows:

  1. Read the first value from each source file and compare them.  Here, the smaller of the first two values is 1 from FileTwo, so this is written to the new file.
  2. Compare the first value in FileOne with the second value in FileTwo. Since 4 is smaller than 7, the value 4 is written to the new file next.
  3. Compare the second value in FileOne with the second value in FileTwo. Since 5 is less than 7, the value 5 is written to the new file next.
  4. Compare the third value in FileOne with the second value in FileTwo. Since 6 is less than 7, the value 6 is written to the new file next .
  5. Compare the fourth value in FileOne with the second value in FileTwo. Since 9 is bigger than 7, the value 7 is written to the new file next.
  6. Compare the fourth value in FileOne with the third value in FileTwo. Since 8 is less than 9, the value 8 is written to the new file next.
  7. Compare the fourth value in FileOne with the fourth value in FileTwo.  Since 9 is less than 10, the value 9 is written to the new file next.
  8. At this point FileOne has been fully checked, so the remaining values in FileTwo can be written across one after another.

 

This algorithm can be expressed in simple terms as follows:

Check source files are not empty
Get first item from each file
REPEAT
         examine the current item in each source file
         copy the smallest item to the new file
         get next item from source file that this item was copied from
UNTIL input files are empty
IF only one input file exhausted THEN
         copy remaining items from other file to new file
END IF

A more complete description of the algorithm looks like this:

open existing files
create new file
check existing files are not empty
use pointers/counters to identify records for comparison
REPEAT
        compare records indicated by pointers
   IF records are different THEN
               copy earlier value record to new file
               move appropriate pointer
       ELSE
               copy one record only to new file
               move both pointers
        END IF
UNTIL end of one file
copy remaining records from other file
close files

Note the following points about this algorithm:

  • You need to check that the source files have some data in them first
  • You must open the two source files and create a new destination file
  • If a value is copied to the destination file, you can advance to the next record in the file it came from
  • Duplicates need to be handled. They are ignored in this case.
  • When one source file is exhausted, the remaining records from the other source file can be copied to the destination file directly.

 

This merge process assumes two things:

  • Both source files share a common key
  • Records in each source file are already sorted according to this key

 

NOTE:
The algorithm for merging two sorted files is very similar to the algorithm for merging two sorted lists, which is an important aspect of the merge sort.