DNN-GP: Diagnosing and Mitigating Model's Faults Using Latent Concepts

Authors: 

Shuo Wang, Shanghai Jiao Tong University; Hongsheng Hu, CSIRO's Data61; Jiamin Chang, University of New South Wales and CSIRO's Data61; Benjamin Zi Hao Zhao, Macquarie University; Qi Alfred Chen, University of California, Irvine; Minhui Xue, CSIRO's Data61

Abstract: 

Despite the impressive capabilities of Deep Neural Networks (DNN), these systems remain fault-prone due to unresolved issues of robustness to perturbations and concept drift. Existing approaches to interpreting faults often provide only low-level abstractions, while struggling to extract meaningful concepts to understand the root cause. Furthermore, these prior methods lack integration and generalization across multiple types of faults. To address these limitations, we present a fault diagnosis tool (akin to a General Practitioner) DNN-GP, an integrated interpreter designed to diagnose various types of model faults through the interpretation of latent concepts. DNN-GP incorporates probing samples derived from adversarial attacks, semantic attacks, and samples exhibiting drifting issues to provide a comprehensible interpretation of a model's erroneous decisions. Armed with an awareness of the faults, DNN-GP derives countermeasures from the concept space to bolster the model's resilience. DNN-GP is trained once on a dataset and can be transferred to provide versatile, unsupervised diagnoses for other models, and is sufficiently general to effectively mitigate unseen attacks. DNN-GP is evaluated on three real-world datasets covering both attack and drift scenarios to demonstrate state-to the-art detection accuracy (near 100%) with low false positive rates (<5%).

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.