Sort a Large File
Sort a large file by breaking it into sections that can
be sorted efficiently.
Notes
All files are read completely into virtual memory before they are sorted.
Only a "small" number of lines are sorted at a time.
This might be a memory cache-ing optimization.
This duplicates some of the code used for multiple file name support
(the SrcSpec concern).
-
Declare intermediate files control variables (node, n_temp_files, tempfiles)
- Variable declaration
- Segment Source
- 1563: struct tempnode *node;
1564: int n_temp_files = 0;
1565: char **tempfiles;
1566:
- Section a large file
-
Code modification
- Segment Source
-
1576: while (fillbuf (&buf, fp))
1577: {
- Put each intermediate result in a temporary
- Code insertion
- Segment Source
-
1591: ++n_temp_files;
1592: tfp = xtmpfopen (tempname ());
- Put each intermediate result in a temporary
- Code insertion
- Segment Source
-
1594: for (i = 0; i < lines.used; ++i)
1595: if (!unique || i == 0
1596: || compare (&lines.lines[i], &lines.lines[i - 1]))
1597: {
1598: write_bytes (lines.lines[i].text, lines.lines[i].length, tfp);
1599: putc (eolchar, tfp);
1600: }
- Put each intermediate result in a temporary
- Code insertion
- Segment Source
-
1602: xfclose (tfp);
- Close large file loop
-
Code insertion
- Segment Source
- 1603: }
- Merge the intermediate files
-
Code insertion
- Segment Source
- 1611: if (n_temp_files)
1612: {
1613: tempfiles = (char **) xmalloc (n_temp_files * sizeof (char *));
1614: i = n_temp_files;
1615: for (node = temphead.next; i > 0; node = node->next)
1616: tempfiles[--i] = node->name;
1617: merge (tempfiles, n_temp_files, ofp);
1618: free ((char *) tempfiles);
1619: }