To demonstrate the impact of IO-Lite on the performance of a wider range of applications, and also to gain experience with the use of the IO-Lite API, a number of existing UNIX programs were converted and some new programs were written to use IO-Lite. We modified GNU grep, wc, cat, and the GNU gcc compiler chain (compiler driver, C preprocessor, C compiler, and assembler). Figure 8 depicts the results obtained with grep, wc, and permute. The ``wc'' refers to a run of the word-count program on a 1.75 MB file. The file is in the file cache, so no physical I/O occurs. ``Permute'' generates all possible permutations of 8 character words in a 80 character string. Its output ( 10! * 80 = 290304000 bytes) is piped into the ``wc'' program. The ``grep'' bar refers to a run of the GNU grep program on the same file used for the ``wc'' program, but the file is piped into wc from cat instead of being read directly from disk.
Improvements in runtime of approximately 15% result from the use of IO-Lite for wc, since it reads cached files. The benefit of IO-Lite is reduced because each page of the cached file must be mapped into the application's address space when a file is read from the IO-Lite file cache.
For the ``permute'' program the improvement is more significant (24%). The reason is that a pipeline is involved in the latter program. Whenever local interprocess communication occurs, the IO-Lite implementation can recycle buffers, avoiding all VM map operations. Finally, in the ``grep'' case, the overhead of multiple copies is eliminated, so the IO-Lite version is able to eliminate 3 copies (one due to ``grep'', and two due to ``cat'').
The gcc compiler chain was converted mainly to determine if there were benefits from IO-Lite for more compute-bound applications and to stress the IO-Lite implementation. We expected that a compiler is too compute-intensive to benefit substantially from I/O performance improvements. Rather than modify the entire program, we simply replaced the stdio library with a version that uses IO-Lite for communication over pipes. Interestingly, converting the compiler to use IO-Lite actually led to a measurable performance improvement. The improvement is mainly due to the fact that IO-Lite allows efficient communication through pipes. Although the standard gcc has an option that uses pipes instead of temporary files for communication between the compiler's various stages, various inefficiencies in the handling of pipes actually caused a significant slowdown, so the baseline gcc numbers used for comparison are for gcc running without pipes. Since IO-Lite can handle pipes very efficiently, unexpected performance improvements resulted from its use. The ``gcc sm'' and ``gcc lg'' bars refer to compiles of a 1200 Byte and a 206 KByte file, respectively.
The ``grep'' and ``wc'' programs read their input sequentially, and were converted to use the IO-Lite API. The C preprocessor's output, the compiler's input and output, and the assembler's input all use the C stdio library, and were converted merely by relinking them with an IO-Lite version of stdio library. The preprocessor (cpp) uses mmap to read its input.