File System Checker Robustness

Om Rameshwar Gatla, New Mexico State University


File systems are built to efficiently store data, but in scenarios such as system failures, file systems may fail to preserve the integrity of the data. In such scenarios, most file systems employ a checker program to recover from a corrupted state.

File system checker usually scans the entire file system and checks a set of consistency rules to modify the metadata and bring back the file system to a consistent state. Depending on the corruption scenario, checkers may take a long time to fix inconsistencies. We can also observe that checkers directly update the on-disk metadata structures. Many modern file systems employ various failure techniques to mitigate the effects of system crashes while modifying it’s metadata, but we do not observe such protection available while running the checker. Therefore any interruption to the repair procedure may further corrupt the file system.

In our work we study the behavior of checkers with interruption. We have observed that any interruption to the repairing procedure causes irreparable damage to the file system. To mitigate this issue, we have designed and implemented rfsck-lib, a general write ahead logging library with a simple interface. Based on the similarities in the implementation of existing checkers, rfsck-lib decouples the logging from the repairing procedure and provides an interface to log repairing writes in fine granularity.

@conference {213034,
author = {Om Rameshwar Gatla},
title = {File System Checker Robustness},
year = {2018},
address = {Oakland, CA},
publisher = {USENIX Association},
month = feb