Check out the new USENIX Web site.

Home About USENIX Events Membership Publications Students
WIESS 2000 Abstract

Experiences in Measuring the Reliability of a Cache-Based Storage System

Dan Lambright, EMC


We present our experiences in benchmarking the reliability of the cache component of a storage system in a development environment. The reliability metrics we measured are availability from the standpoint of the host and maintainability from the standpoint of the system operator. We created errors using software fault injection, and measured their impact using a combination of performance measurement techniques and the rehearsal of maintenance procedures. This paper gives three case studies. The first two describe experiments that recreate very specific breakdowns in the software logic, and the third describes an experiment simulating a memory hardware failure that creates unpredictable effects. We found that, taken together, these various techniques gave us a useful picture of how well our cache management software tolerated faults.

?Need help? Use our Contacts page.

Last changed: 23 Jan. 2002 ml
Technical Program