Latency Results

We analyze the latency of the new servers by repeating our earlier experiments to understand latency and blocking. We begin by repeating the burstiness measurement, which indicates that blocking-induced burstiness has also been reduced or eliminated in both servers. In Figures 13 and 14, we see that in New-Flash, the mean number of events per call has dropped from 61 to 1.6, and the median has dropped from 12 to 2. Likewise, Flashpache no longer exhibits bimodal behavior at the scheduler level, instead showing roughly 20% of all processes ready at any given time. In both cases, the request batching and associated idle periods are eliminated.

We evaluate step-by-step improvements to Flash with the results shown in Table 5. Included are the figures for the original Flash, as well as the intermediate steps of file descriptor passing (fd pass) and removing memory-mapped files (no mmap). Throughputs are measured with infinite-demand and response times are measured at 0.95 load level. We can see that the overall capacity of Flash has increased by 34% for this workload, while Apache's capacity increases by 13%.

**Figure 18:** CDF breakdown for New-Flash on 3.0 GB data set, load level 0.95

**Figure 19:** Service inversion of original and modified servers

**Figure 20:** CDF breakdown for New-Flash on in-memory workload, load level 0.95

**Figure 21:** CDF breakdown for Flashpache on 3.0 GB data set, load level 0.95

The more impressive result is the reduction in latency, even when run at these higher throughputs. Flash sees improvements of 40x median, 6x mean, and 54x in 90 percentile latency. Eliminating metadata-induced blocking has improvements of 5.8x median, and 3.6x mean, and eliminating blocking in sendfile() reduces a factor of 3 in mean latency. Apache sees improvements of 6x median, 15x mean, and 72x in 90 percentile latency. The one seemingly odd result, an increase in mean latency from fd-pass to no-mmap, is due to an increase in blocking, since the removal of mmap() also results in losing the mincore() function, which could precisely determine memory residency of pages. The New-Flash server obtains this residency information via a flag in sendfile(), which again eliminates blocking.

Not only do the new servers have lower latencies, but they also show qualitatively different latency characteristics. Figure 15 shows that median latency no longer grows with data set size, despite the increase in mean latencies. Mean latency still increases due to cache misses, but the median request is a cache hit in all cases. Figures 16 and 17 show the latency CDFs for $5^{th}$ percentile, mean, median, and $95^{th}$ percentile with varying load. Though the mean latency and $95^{th}$ percentile increase, the $95^{th}$ percentile shows less than a tripling versus its minimum values, which is much less than the two orders of magnitude observed originally. The other values are very flat, indicating that most of the requests are served with the same quality at different load levels. More importantly, the $95^{th}$ percentile CDF values are lower than the mean latency, because the time spent on the largest requests (the last 5%) is much higher than the time spent on other requests, as expected from Table 2.