AIRS: Explanation for Deep Reinforcement Learning based Security Applications


Jiahao Yu, Northwestern University; Wenbo Guo, Purdue University; Qi Qin, ShanghaiTech University; Gang Wang, University of Illinois at Urbana-Champaign; Ting Wang, The Pennsylvania State University; Xinyu Xing, Northwestern University


Recently, we have witnessed the success of deep reinforcement learning (DRL) in many security applications, ranging from malware mutation to selfish blockchain mining. Like all other machine learning methods, the lack of explainability has been limiting its broad adoption as users have difficulty establishing trust in DRL models' decisions. Over the past years, different methods have been proposed to explain DRL models but unfortunately, they are often not suitable for security applications, in which explanation fidelity, efficiency, and the capability of model debugging are largely lacking.

In this work, we propose AIRS, a general framework to explain deep reinforcement learning-based security applications. Unlike previous works that pinpoint important features to the agent's current action, our explanation is at the step level. It models the relationship between the final reward and the key steps that a DRL agent takes, and thus outputs the steps that are most critical towards the final reward the agent has gathered. Using four representative security-critical applications, we evaluate AIRS from the perspectives of explainability, fidelity, stability, and efficiency. We show that AIRS could outperform alternative explainable DRL methods. We also showcase AIRS's utility, demonstrating that our explanation could facilitate the DRL model's failure offset, help users establish trust in a model decision, and even assist the identification of inappropriate reward designs.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {287091,
author = {Jiahao Yu and Wenbo Guo and Qi Qin and Gang Wang and Ting Wang and Xinyu Xing},
title = {{AIRS}: Explanation for Deep Reinforcement Learning based Security Applications},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
isbn = {978-1-939133-37-3},
address = {Anaheim, CA},
pages = {7375--7392},
url = {},
publisher = {USENIX Association},
month = aug