Having established that speculative execution achieves significant performance improvements, we examine the behavior of the speculating applications and attempt to explain the differences between our results and those obtained with manually modified applications.
The primary metric for automatic hint generation is the number of correct hints generated. Table 4 summarizes the hinting behavior of the original and transformed applications. For Agrep and XDataSlice, we found that speculative execution was able to issue hints for nearly as many of the read calls as the manually modified applications. However, speculative execution was significantly less successful for Gnuld, hinting only 55% of the read calls in contrast to the 78% that the manually modified application was able to hint.
|Benchmark||Read calls||Read blocks||Read bytes||Write calls||Write blocks||Write bytes|
|% manually hinted||68.3%||99.8%||>99.9%||-||-||-|
|% manually hinted||78.4%||86.0%||99.6%||-||-||-|
|% manually hinted||97.6%||97.6%||>99.9%||-||-||-|
There are two basic reasons why speculating applications may hint fewer read calls than manually modified applications. One is that speculating applications must determine what to hint dynamically, but are only allowed to pursue hint discovery while normal execution is stalled. In fact, the more successfully a speculating application generates hints that will hide I/O latency, the less opportunity it will have to pursue hint discovery, unless the application is bandwidth-bound. The other reason is that data dependencies limit how early prefetches can be issued. For example, if the data specified by the next read call depends on the data returned by the currently outstanding read call, then speculative execution will not be able to hint the next read call.
Agrep is the most likely of our applications to be affected by the fact that hint discovery is only performed during I/O stalls. Agrep has the largest median number of cycles between read calls - 30362, 15902 and 4454 for Agrep, Gnuld and XDataSlice, respectively. It also has the largest ratio between the median number of cycles between hint calls and the median number of cycles between read calls - 7.5, 1.6 and 1.3 for Agrep, Gnuld and XDataSlice, respectively. (This ratio, which we call the dilation factor, is larger than one mainly due to the copy-on-write checks performed during speculative execution.) Accordingly, of our three applications, the speculating Agrep generates hints at by far the slowest rate. However, the almost equal gains achieved by the speculating Agrep and the manually modified Agrep indicate that this property of our design has negligible impact.
During the process of manually modifying an application to issue hints, programmers can make the application more amenable to prefetching by restructuring the code to increase the number of cycles between dependent read calls. As mentioned in Section 2.2, this was the case for the manually modified Gnuld. The speculating Gnuld, however, was produced from the original, unmodified code. It is only able to hint 55% of the read calls because a speculating application cannot hint a read call if it depends on a prior read and there are no I/O stalls between when the prior read completes and when the read call is issued. In addition, since a read cannot be hinted until all the data it is dependent on becomes available, data dependencies may cause hints to be issued too late to fully hide the latency of fetching the specified data. Comparing the speculating Gnuld to the manually modified Gnuld, over five times as many data blocks were only partially prefetched before being requested by the application (as shown in the Partially column of Table 5), indicating that the speculating Gnuld experienced many more I/O stalls. Finally, since each speculation proceeds with the assumption that future read calls are not data dependent, data dependencies may cause erroneous hints to be generated. The speculating Gnuld generates 2,336 erroneous hints, as shown in Table 4, contributing to the prefetching of 3,924 unused data blocks, as shown in Table 5.
|Benchmark||Cache block reads||Prefetched blocks||Fully||%||Partially||%||Unused||%||Cache block reuses|
Prefetching speculatively, and therefore sometimes incorrectly, is not new.
History-based mechanisms all have this property. Specifically, Digital UNIX
has an aggressive automatic read-ahead policy based on the expectation that
files are read sequentially. It prefetches approximately the same number of
blocks as have been read sequentially, up to a maximum of 64 blocks. For
applications that issue nonsequential reads to large files, like XDataSlice,
this read-ahead policy can be entirely too aggressive. As shown in Table 5,
58% of the blocks prefetched by sequential read-ahead for the non-hinting
XDataSlice are not used. In contrast, since the read-ahead policy is only
invoked by unhinted read calls and the hinting XDataSlices generate hints
for almost all of the read calls, the hinting XDataSlices are able to almost
eliminate the erroneous prefetches generated by the read-ahead policy.