Sort a Large File
Sort a Large File

Sort a large file by breaking it into sections that can be sorted efficiently.

Notes
All files are read completely into virtual memory before they are sorted. Only a "small" number of lines are sorted at a time. This might be a memory cache-ing optimization.

This duplicates some of the code used for multiple file name support (the SrcSpec concern).


Declare intermediate files control variables (node, n_temp_files, tempfiles)
Variable declaration
Segment Source
1563:   struct tempnode *node;
1564:   int n_temp_files = 0;
1565:   char **tempfiles;
1566: 

Section a large file
Code modification
Segment Source
1576:       while (fillbuf (&buf, fp))
1577:         {

Put each intermediate result in a temporary
Code insertion
Segment Source
1591:               ++n_temp_files;
1592:               tfp = xtmpfopen (tempname ());

Put each intermediate result in a temporary
Code insertion
Segment Source
1594:           for (i = 0; i < lines.used; ++i)
1595:             if (!unique || i == 0
1596:                 || compare (&lines.lines[i], &lines.lines[i - 1]))
1597:               {
1598:                 write_bytes (lines.lines[i].text, lines.lines[i].length, tfp);
1599:                 putc (eolchar, tfp);
1600:               }

Put each intermediate result in a temporary
Code insertion
Segment Source
1602:             xfclose (tfp);

Close large file loop
Code insertion
Segment Source
1603:         }

Merge the intermediate files
Code insertion
Segment Source
1611:   if (n_temp_files)
1612:     {
1613:       tempfiles = (char **) xmalloc (n_temp_files * sizeof (char *));
1614:       i = n_temp_files;
1615:       for (node = temphead.next; i > 0; node = node->next)
1616:         tempfiles[--i] = node->name;
1617:       merge (tempfiles, n_temp_files, ofp);
1618:       free ((char *) tempfiles);
1619:     }