Performance analysis

Next: Performance side-effects Up: Experimental evaluation Previous: Overall performance results

Performance analysis

Having established that speculative execution achieves significant performance improvements, we examine the behavior of the speculating applications and attempt to explain the differences between our results and those obtained with manually modified applications.

The primary metric for automatic hint generation is the number of correct hints generated. Table 4 summarizes the hinting behavior of the original and transformed applications. For Agrep and XDataSlice, we found that speculative execution was able to issue hints for nearly as many of the read calls as the manually modified applications. However, speculative execution was significantly less successful for Gnuld, hinting only 55% of the read calls in contrast to the 78% that the manually modified application was able to hint.

Benchmark Read calls Read blocks Read bytes Write calls Write blocks Write bytes

Agrep total 4,277 2,928 18,091,527 0 0 0

% hinted 68.1% 99.6% 99.7% - - -

inaccurately hinted 0 0 0 - - -

% manually hinted 68.3% 99.8% >99.9% - - -

Gnuld total 13,037 20,091 60,158,290 2343 3418 8,824,188

% hinted 54.9% 67.5% 89.7% - - -

inaccurately hinted 2,336 6,721 37,177,440 - - -

% manually hinted 78.4% 86.0% 99.6% - - -

XDataSlice total 46,356 46,352 370,663,914 2 2 4,081

% hinted 97.5% 97.5% 99.9% - - -

inaccurately hinted 0 0 0 - - -

% manually hinted 97.6% 97.6% >99.9% - - -

Table 4: Hinting statistics. Total includes explicit file calls only. The hinting behavior of the speculating applications is described by the % hinted and inaccurately hinted figures, and can be compared with the behavior of the manually modified applications (which issued no inaccurate hints) described by the % manually hinted figures. The number of read calls is sometimes larger than the number of read blocks because, for example, Agrep issues at least one extra read call per file to detect the end of the file. Discounting these non-data-returning reads (which do not need to be hinted), over 99% of Agrep's read calls were hinted.

Benchmark	Read calls	Read blocks	Read bytes	Write calls	Write blocks	Write bytes
Agrep	total	4,277	2,928	18,091,527	0	0	0
% hinted	68.1%	99.6%	99.7%	-	-	-
inaccurately hinted	0	0	0	-	-	-
% manually hinted	68.3%	99.8%	>99.9%	-	-	-
Gnuld	total	13,037	20,091	60,158,290	2343	3418	8,824,188
% hinted	54.9%	67.5%	89.7%	-	-	-
inaccurately hinted	2,336	6,721	37,177,440	-	-	-
% manually hinted	78.4%	86.0%	99.6%	-	-	-
XDataSlice	total	46,356	46,352	370,663,914	2	2	4,081
% hinted	97.5%	97.5%	99.9%	-	-	-
inaccurately hinted	0	0	0	-	-	-
% manually hinted	97.6%	97.6%	>99.9%	-	-	-

There are two basic reasons why speculating applications may hint fewer read calls than manually modified applications. One is that speculating applications must determine what to hint dynamically, but are only allowed to pursue hint discovery while normal execution is stalled. In fact, the more successfully a speculating application generates hints that will hide I/O latency, the less opportunity it will have to pursue hint discovery, unless the application is bandwidth-bound. The other reason is that data dependencies limit how early prefetches can be issued. For example, if the data specified by the next read call depends on the data returned by the currently outstanding read call, then speculative execution will not be able to hint the next read call.

Agrep is the most likely of our applications to be affected by the fact that hint discovery is only performed during I/O stalls. Agrep has the largest median number of cycles between read calls - 30362, 15902 and 4454 for Agrep, Gnuld and XDataSlice, respectively. It also has the largest ratio between the median number of cycles between hint calls and the median number of cycles between read calls - 7.5, 1.6 and 1.3 for Agrep, Gnuld and XDataSlice, respectively. (This ratio, which we call the dilation factor, is larger than one mainly due to the copy-on-write checks performed during speculative execution.) Accordingly, of our three applications, the speculating Agrep generates hints at by far the slowest rate. However, the almost equal gains achieved by the speculating Agrep and the manually modified Agrep indicate that this property of our design has negligible impact.

During the process of manually modifying an application to issue hints, programmers can make the application more amenable to prefetching by restructuring the code to increase the number of cycles between dependent read calls. As mentioned in Section 2.2, this was the case for the manually modified Gnuld. The speculating Gnuld, however, was produced from the original, unmodified code. It is only able to hint 55% of the read calls because a speculating application cannot hint a read call if it depends on a prior read and there are no I/O stalls between when the prior read completes and when the read call is issued. In addition, since a read cannot be hinted until all the data it is dependent on becomes available, data dependencies may cause hints to be issued too late to fully hide the latency of fetching the specified data. Comparing the speculating Gnuld to the manually modified Gnuld, over five times as many data blocks were only partially prefetched before being requested by the application (as shown in the Partially column of Table 5), indicating that the speculating Gnuld experienced many more I/O stalls. Finally, since each speculation proceeds with the assumption that future read calls are not data dependent, data dependencies may cause erroneous hints to be generated. The speculating Gnuld generates 2,336 erroneous hints, as shown in Table 4, contributing to the prefetching of 3,924 unused data blocks, as shown in Table 5.

Benchmark Cache block reads Prefetched blocks Fully % Partially % Unused % Cache block reuses

Agrep Original 3,424 1,031 529 51.3% 499 48.4% 3 0.4% 416

SpecHint 3,726 3,003 2,707 90.2% 272 9.1% 23 0.8% 655

Manual 3,423 2,947 2,687 91.2% 258 8.8% 1 0.0% 421

Gnuld Original 24,074 5,511 2,544 46.2% 2,014 36.6% 952 17.3% 12,435

SpecHint 25,353 12,855 3,498 27.2% 5,432 42.3% 3,924 30.5% 13,646

Manual 23,892 10,018 8,933 89.2% 1,057 10.6% 27 0.3% 13,519

XDataSlice Original 49,997 60,702 12,806 21.1% 12,664 20.9% 35,231 58.0% 4,162

SpecHint 50,810 45,338 40,319 88.9% 4,907 10.8% 112 0.3% 4,973

Manual 49,782 44,938 40,167 89.4% 4,750 10.6% 20 0.0% 4,491

Table 5: Prefetching and caching statistics. For the original, non-hinting applications, the prefetching figures are the result of the operating system's sequential read-ahead policy. For the speculating applications, the prefetching figures also include TIP's hint-driven prefetching. Cache Block Reads is the number of block reads from the file cache. Prefetched Blocks is the number of blocks prefetched from disk. Fully is the number of blocks whose prefetch completed before being requested by the application, Partially is the number of blocks partially prefetched before being requested by the application, and Unused is the number of prefetched blocks that were not accessed by the application before being ejected from the file cache. A Cache Block Reuse is counted each time a cached block services a second or subsequent request, and therefore indicates the effectiveness of caching. The closeness of the Cache Block Reuse figures indicates that erroneous prefetching did not significantly harm caching behavior.

Benchmark	Cache block reads	Prefetched blocks	Fully	%	Partially	%	Unused	%	Cache block reuses
Agrep	Original	3,424	1,031	529	51.3%	499	48.4%	3	0.4%	416
SpecHint	3,726	3,003	2,707	90.2%	272	9.1%	23	0.8%	655
Manual	3,423	2,947	2,687	91.2%	258	8.8%	1	0.0%	421
Gnuld	Original	24,074	5,511	2,544	46.2%	2,014	36.6%	952	17.3%	12,435
SpecHint	25,353	12,855	3,498	27.2%	5,432	42.3%	3,924	30.5%	13,646
Manual	23,892	10,018	8,933	89.2%	1,057	10.6%	27	0.3%	13,519
XDataSlice	Original	49,997	60,702	12,806	21.1%	12,664	20.9%	35,231	58.0%	4,162
SpecHint	50,810	45,338	40,319	88.9%	4,907	10.8%	112	0.3%	4,973
Manual	49,782	44,938	40,167	89.4%	4,750	10.6%	20	0.0%	4,491

Prefetching speculatively, and therefore sometimes incorrectly, is not new. History-based mechanisms all have this property. Specifically, Digital UNIX has an aggressive automatic read-ahead policy based on the expectation that files are read sequentially. It prefetches approximately the same number of blocks as have been read sequentially, up to a maximum of 64 blocks. For applications that issue nonsequential reads to large files, like XDataSlice, this read-ahead policy can be entirely too aggressive. As shown in Table 5, 58% of the blocks prefetched by sequential read-ahead for the non-hinting XDataSlice are not used. In contrast, since the read-ahead policy is only invoked by unhinted read calls and the hinting XDataSlices generate hints for almost all of the read calls, the hinting XDataSlices are able to almost eliminate the erroneous prefetches generated by the read-ahead policy.

Next: Performance side-effects Up: Experimental evaluation Previous: Overall performance results

Fay Chang
Tue Jan 5 18:05:04 EST 1999