Limin Yang, University of Illinois at Urbana-Champaign; Wenbo Guo, The Pennsylvania State University; Qingying Hao, University of Illinois at Urbana-Champaign; Arridhana Ciptadi and Ali Ahmadzadeh, Blue Hexagon; Xinyu Xing, The Pennsylvania State University; Gang Wang, University of Illinois at Urbana-Champaign
Concept drift poses a critical challenge to deploy machine learning models to solve practical security problems. Due to the dynamic behavior changes of attackers (and/or the benign counterparts), the testing data distribution is often shifting from the original training data over time, causing major failures to the deployed model.
To combat concept drift, we present a novel system CADE aiming to 1) detect drifting samples that deviate from existing classes, and 2) provide explanations to reason the detected drift. Unlike traditional approaches (that require a large number of new labels to determine concept drift statistically), we aim to identify individual drifting samples as they arrive. Recognizing the challenges introduced by the high-dimensional outlier space, we propose to map the data samples into a low-dimensional space and automatically learn a distance function to measure the dissimilarity between samples. Using contrastive learning, we can take full advantage of existing labels in the training dataset to learn how to compare and contrast pairs of samples. To reason the meaning of the detected drift, we develop a distance-based explanation method. We show that explaining "distance" is much more effective than traditional methods that focus on explaining a "decision boundary" in this problem context. We evaluate CADE with two case studies: Android malware classification and network intrusion detection. We further work with a security company to test CADE on its malware database. Our results show that CADE can effectively detect drifting samples and provide semantically meaningful explanations.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Limin Yang and Wenbo Guo and Qingying Hao and Arridhana Ciptadi and Ali Ahmadzadeh and Xinyu Xing and Gang Wang},
title = {{CADE}: Detecting and Explaining Concept Drift Samples for Security Applications},
booktitle = {30th USENIX Security Symposium (USENIX Security 21)},
year = {2021},
isbn = {978-1-939133-24-3},
pages = {2327--2344},
url = {https://www.usenix.org/conference/usenixsecurity21/presentation/yang-limin},
publisher = {USENIX Association},
month = aug
}