Check out the new USENIX Web site. next up previous
Next: Eager-Writing Disk Arrays Up: Configuring and Scheduling an Previous: Configuring and Scheduling an


Introduction

Transaction processing applications such as those exemplified by TPC-C [27] tend to pose more difficult challenges to storage systems than office workloads. These applications exhibit little locality or sequentiality; a large percentage of the I/O requests are writes, many of which are synchronous; and there may be little idle time. Traditional techniques that work well for office workloads tend to be less effective for these transaction processing applications. Memory caching provides little relief in the presence of poor locality and a small read-to-write ratio. As disk areal density improves at 60-100% annually [9], as memory density improves at only 40% per year, and as the amount of data in transaction processing systems continues to grow, one can expect little improvement in cache hit rates in the near future. Delayed write techniques become less applicable in the presence of a large number of synchronous writes that must satisfy strict reliability requirements, requirements that are sometimes not met by even expensive NVRAM-based solutions [13]. Even when it is possible to buffer delayed writes in faster storage levels, such as an NVRAM, the poor write locality implies that there are very few overwrites before the buffered data reaches disks. Furthermore, high throughput requirements in conjunction with scarce idle time make it difficult to schedule background activities, such as de-staging from NVRAM [13,20], garbage collection in log-structured solutions [13,21,22,24], and data relocation [19], without impacting foreground I/O activities. The net effect of these challenges is that transaction processing applications tend to be more closely limited by disk latency, a performance characteristic that has seen an annual improvement of only about 10% [9]. Although the traditional caching and asynchronous I/O techniques have not been very successful, a number of other techniques have proven promising. One is mirroring: a mirrored system can improve read latency by sending a read request to the disk whose head is closest to a target replica [2,5], and it can improve throughput by intelligently scheduling the requests in a load-balanced manner. Mirroring, unfortunately, is not without its challenges. Chief among them is the cost of replica propagation--each write request is turned into multiple physical writes that compete for I/O bandwidth with normal I/O operations. High update rates, the lack of idle time for masking replica propagation, and poor locality only make matters worse. While mirroring is more effective for improving a read-dominant workload, a technique called eager-writing is more effective for improving a write-dominant workload. Eager-writing refers to the technique of allocating a free block that is closest to the current disk head position to satisfy a write request [4,6,28]. Under the right conditions, by eliminating almost all of seek and rotational delay, eager-writing can deliver very fast write performance without compromising reliability guarantees, even for workloads that comprise of synchronous I/Os and have poor locality. What eager-writing does not address, however, is read performance. Since data replication in a mirrored system improves read performance, and since eager-writing improves write performance, reduces the cost of replica propagation, and ensures a high degree of data reliability, it is only natural to integrate these two techniques so that we may harvest the best of both worlds. We call the result of this integration an eager-writing array or an EW-Array: in the simplest form, an EW-Array is just a mirrored system that supports eager-writing. This integration, however, is not without its own tension. In order to achieve good write performance under eager-writing, one must reserve enough disk space to ensure that an empty block can be located close to the current disk head position. At the same time, to achieve good read performance under mirroring, the system needs to devote disk space to store a sufficient number of replicas so that it can choose a conveniently located replica to read. Given a fixed budget of disks, one must resolve this tension by carefully balancing the number of disks devoted to each of these two dimensions. To further complicate the matter, striping can improve both read and write performance, so one must also consider this third dimension of the number of disks devoted to striping. Although configuring a storage system based on the number of disk heads instead of capacity for TPC-C-like workloads is a common practice, and some previous studies such as the ``Doubly Distorted Mirror'' have incorporated ``write-anywhere'' elements in a disk array [19], what is not well understood is how to balance the number of disks devoted to each one of the mirroring, eager-writing, and striping dimensions to get the most out of a given number of disks. While properly configuring an EW-Array along these three dimensions presents one challenge, request scheduling on such a disk array presents another. In the request queue of a traditional update-in-place storage system, the locations of all the queued requests are known. Although the scheduler can sometimes choose among several mirrored replicas to satisfy a request, the degree of freedom is limited. This is no longer the case for an EW-Array: while the locations of the read requests are known, the scheduler has the freedom of choosing any combination of free blocks to satisfy the write requests. Although disk scheduling is a well-studied problem in conventional systems, what is not well understood is how a good scheduler can exploit this large degree of freedom to optimize throughput. The main contributions of this paper are:
$\bullet$
a disk array design that integrates eager-writing with mirroring in a balanced configuration to provide the best read and write performance for a transaction processing workload,
$\bullet$
a number of disk array scheduling algorithms that can effectively exploit the flexibility afforded by the location-independent nature of eager writing, and
$\bullet$
evaluation of a number of alternative strategies that share the common goal of improving performance by introducing extra disk capacity.
We have designed and implemented a prototype EW-Array. Our experimental results demonstrate that the EW-Array can significantly outperform conventional systems. For example, under the TPC-C workload, a properly configured EW-Array delivers 1.4 to 1.6 times lower latency than that achieved on highly optimized striping and mirroring systems. The same EW-Array achieves approximately 2 times better sustainable throughput. The remainder of this paper is organized as follows. Section 2 motivates the integration of eager-writing with mirroring in an EW-Array. Section 3 explores different EW-Array configurations as we change the way the extra disk space is distributed. Section 4 analyzes a number of new disk scheduling algorithms that exploit the location independent nature of eager-writing. Section 5 describes the integrated simulator and prototype EW-Array. The experimental results of Section 6 evaluate a wide range of disk array configuration alternatives. Section 7 describes some of the related work. Section 8 concludes.
next up previous
Next: Eager-Writing Disk Arrays Up: Configuring and Scheduling an Previous: Configuring and Scheduling an
Chi Zhang
2001-11-16