Flashpache

Due to the differences in software architecture, we cannot directly employ the same techniques that we used in New-Flash to improve Apache. However, given our earlier measurements on Apache, we can deduce that filesystem-related calls are likely to block, and with these as candidates, we can leverage the lessons from Flash. Since Apache does not cache file descriptors, each process calls open() on every request, and this behavior results in a much higher rate of these calls.

We modify Apache to offload the URL-to-file translation step, in which metadata-related system calls occur. This step is handled by a new ``backend'' process, to which all of the Apache processes connect via persistent Unix-domain sockets. The backend employs a Flash-like architecture, with a main process and a small number of helpers. The main process keeps a filename cache like the one in the Flash server, and schedules helpers to perform cache miss operations. The backend is responsible for finding the requested file, opening the file, and sending the file descriptor and metadata information back to the Apache processes. Upon receiving a valid open file descriptor from the backend, the Apache process can return the associated data to the client. Since the backend handles URL lookup for all Apache processes, it is possible to combine duplicated requests and even preload data blocks into the filesystem cache before passing control back to the Apache processes, thus reducing the number of context switches and the chances of blocking.

**Figure 13:** CDFs of # of ready events for Flash variants, infinite-demand workload

We call this new server Flashpache, to reflect its hybrid architecture. The changes involved in this process are relatively small and isolated - fewer than 100 lines of code are modified in Apache, and half of this count is code taken directly from New-Flash. The backend process is similarly derived from parts of New-Flash, and consists of roughly 200 lines of code changes.

This architecture, shown in Figure 12, eliminates unnecessary blocking in two ways. First, in Flashpache, most of the disk access is performed by a small number of helper processes controlled by the backend, reducing the amount of locking contention. This observation is confirmed by the fact that less blocking occurs in Flashpache than in Apache with the same workload. Second, since the backend caches metadata information and keeps files open, it effectively prevents metadata cache entries from being evicted when memory pressure is an issue. However, we do not observe the CPU reduction from caching as the main source of the benefit - the interprocess communication cost between the Apache processes and the backend is almost equivalent to or even a little higher than the original system calls.

**Figure 14:** Scheduler burstiness in Flashpache for 256 and 1024 processes, infinite-demand workload

**Figure 15:** Response times for new servers with different data set sizes and infinite-demand workload

**Figure 16:** Latency profile of New-Flash (Flash profile shown in Figure 5). Load level 1.0 equals 450 Mb/s

**Figure 17:** Latency profile of Flashpache (Apache profile shown in Figure 4). Load level of 1.0 equals 273 Mb/s

Table 5: Latencies & capacities for all servers

	Latency (ms)			Capacity
	median	mean	90%	(Mb/s)
Flash	67.4	181.0	362.0	336.0
fd pass	11.5	50.0	71.2	395.0
no mmap	1.8	93.5	92.9	437.5
New-Flash	1.6	29.3	6.6	450.0
Apache	6.6	180.2	414.7	241.1
Flashpache	1.1	12.0	5.7	272.9