Check out the new USENIX Web site. next up previous
Next: Validating the Integrated Simulator Up: Logical-to-physical Address Mapping Previous: Querying the Mapping


Updating and Recovering the Mapping with Incremental Checkpointing

Figure 7: Logging of logical-to-physical map updates and incremental checkpointing of the map.
\begin{figure}
\centerline {\psfig{figure=figs/incr.eps,width=3.1in,clip=}}
\end{figure}

In designing a mechanism to keep the logical-to-physical address map persistent, we strive to accomplish two goals: one is low overhead imposed by the mechanism on ``normal'' I/O operations, and the other is fast recovery of the map. Figure 7 shows the map-related data structures used in various storage levels. Updates to the logical-to-physical map are accumulated in a small amount of NVRAM. When the NVRAM is filled, its content is appended to a map log region on disk. (Both the NVRAM and the map log region on disk can be replicated for increased reliability.) We divide the logical-to-physical address map into $M$ portions. ($M=4$ in Figure 7.) Periodically, a portion of the map is checkpointed as it is appended to the log. After all $M$ portions of the map are checkpointed, the map update log entries that are older than the $M$th oldest checkpoint can be freed and this freed space can be used to log the new updates. The size of the map log region on disk is bound by the frequency and the size of the checkpoints. When we reach the end of the log region, the log can simply ``wrap around'' since we have the knowledge that the log entries at the beginning of the region must have been freed. The location of the youngest valid checkpoint is also stored in NVRAM. During recovery, we first read the entire map log region on disk to reconstruct the in-memory logical-to-physical address map, starting with the youngest valid checkpoint. We then replay the log entries buffered in NVRAM. At the end of this process, the in-memory logical-to-physical address map is fully restored. We note a number of desirable properties of the mechanism described above. The size and frequency of the checkpoint allows one to trade off the overhead during normal operation against the map recovery time. In particular, the checkpoints bound the size of the map log region, thus bounding the recovery time. Incremental checkpointing prevents undesirable prolonged delays associated with the checkpointing of the entire map. It also allows the space occupied by obsolete map update entries to be reclaimed without expensive garbage collection of the log. It is interesting to compare the mechanism employed in managing the logical-to-physical address map of an EW-Array against those employed in managing data itself in a number of related systems such as an NVRAM-backed LFS [1,21] and RAPID-Cache [13]. While we buffer and checkpoint metadata, these other systems buffer and reorganize data. The three orders of magnitude difference in the number of bytes involved makes it easy to control the overhead of managing metadata in an EW-Array. Practically, in a $D_m \times D_s \times D_d = 2
\times 9 \times 2$ EW-Array prototype (whose disks are 9 GB each), with a 100KB NVRAM to buffer map entry updates, we have observed that the amount of overhead due to the maintenance of the map during normal operations of the TPC-C workload is below 1% and it takes 9.7 seconds to recover the map. The recovery performance can be further improved if we distribute the map log region across a number of disks so the map can be read in parallel.
next up previous
Next: Validating the Integrated Simulator Up: Logical-to-physical Address Mapping Previous: Querying the Mapping
Chi Zhang
2001-11-16