Reverse Debugging of Kernel Failures in Deployed Systems

Authors: 

Xinyang Ge, Microsoft Research; Ben Niu, Microsoft; Weidong Cui, Microsoft Research

Abstract: 

Post-mortem diagnosis of kernel failures is crucial for operating system vendors because kernel failures impact the reliability and security of the whole system. However, debugging kernel failures in deployed systems remains a challenge because developers have to speculate the conditions leading to the failure based on limited information such as memory dumps. In this paper, we present Kernel REPT, the first practical reverse debugging solution for kernel failures that is highly efficient, imposes small memory footprint and requires no extra software layer. To meet this goal, Kernel REPT employs efficient hardware tracing to record the kernel's control flow on each processor, recognizes the control flow of each software thread based on the context switch history, and recovers its data flow by emulating machine instructions and hardware events such as interrupts and exceptions. We design, implement, and deploy Kernel REPT on Microsoft Windows. We show that developers can use Kernel REPT to do interactive reverse debugging and find the root cause of real-world kernel failures. Kernel REPT also enables automatic root-cause analysis for certain kernel failures that were hard to debug even manually. Furthermore, Kernel REPT can proactively identify kernel bugs by checking the reconstructed execution history against a set of predetermined invariants.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {254376,
author = {Xinyang Ge and Ben Niu and Weidong Cui},
title = {Reverse Debugging of Kernel Failures in Deployed Systems},
booktitle = {2020 USENIX Annual Technical Conference (USENIX ATC 20)},
year = {2020},
isbn = {978-1-939133-14-4},
pages = {281--292},
url = {https://www.usenix.org/conference/atc20/presentation/ge},
publisher = {USENIX Association},
month = jul
}

Presentation Video