Checkpointing

Next: Fine-grained Address Translation Up: Granularity Choices for Persistence Previous: Data Caching

Checkpointing

Finally, we consider the granularity of checkpointing, which is defined as the smallest unit of storage that is written to non-volatile media for the purpose of saving recovery information to protect against failures and crashes.

Texas uses virtual memory protections to detect pages that are modified by the application between checkpoints. Therefore, the default unit of checkpointing in the usual case is a virtual memory page. Texas employs a simple write-ahead logging scheme to support checkpointing and recovery--at checkpoint time, modified pages are written to a log on stable storage before the actual database is updated [17].

The granularity of checkpointing can be refined by the use of sub-page logging. The approach relies on a page ``diffing'' technique that we originally proposed in [17]. The basic idea is to save clean versions of pages before they are modified by the application; the original (clean) and modified (dirty) versions of a page can then be compared to detect the exact sub-page areas that are actually updated by the application and only those ``diffs'' are logged to stable storage. This technique can be used to reduce the amount of I/O at checkpoint time, subject to the application's locality characteristics. The granularity of checkpointing in this case is equivalent to the size of the ``diffs'' which are saved to stable storage.⁷

Another enhancement to the checkpointing mechanism is to maintain the log in a compressed format. As the checkpoint-related data is streamed to disk, we can intervene to perform some inline compression using specialized algorithms tuned to heap data. Further research has been initiated in this area and initial results indicate that the I/O cost can be reduced by about a factor of two, and that data can be compressed fast enough to double the effective disk bandwidth on current machines [25,9]. As CPU speeds continue to increase fast than disk speeds, the cost of compression shrinks exponentially relative to cost of disk I/O. Further reduction in costs is also possible with improved compression algorithms and adaptive techniques.

Footnotes

... storage.⁷: The basic ``diffing'' technique has been implemented in the context of QuickStore [20]; preliminary results are encouraging, although more investigation is required.

Next: Fine-grained Address Translation Up: Granularity Choices for Persistence Previous: Data Caching

Sheetal V. Kakkad