When the Magic Wears Off: Flaws in ML for Security Evaluations (and What to Do about It)

Wednesday, January 30, 2019 - 12:00 pm12:30 pm

Lorenzo Cavallaro, King's College London


Academic research on machine learning-based malware classification appears to leave very little room for improvement, boasting F1 performance figures of up to 0.99. Is the problem solved? In this talk, we argue that there is an endemic issue of inflated results due to two pervasive sources of experimental bias: spatial bias, caused by distributions of training and testing data not representative of a real-world deployment, and temporal bias, caused by incorrect splits of training and testing sets (e.g., in cross-validation) leading to impossible configurations. To overcome this issue, we propose a set of space and time constraints for experiment design. Furthermore, we introduce a new metric that summarizes the performance of a classifier over time, i.e., its expected robustness in a real-world setting. Finally, we present an algorithm to tune the performance of a given classifier. We have implemented our solutions in TESSERACT, an open source evaluation framework that allows a fair comparison of malware classifiers in a realistic setting. We used TESSERACT to evaluate two well-known malware classifiers from the literature on a dataset of 129K applications, demonstrating the distortion of results due to experimental bias and showcasing significant improvements from tuning.

Lorenzo Cavallaro, King's College London

Lorenzo Cavallaro is a Full Professor of Computer Science, Chair in Cybersecurity (Systems Security) in the Department of Informatics at King's College London, where he leads the Systems Security Research Lab. He received a combined BSc-MSc (summa cum laudae) in Computer Science from the University of Milan in 2004 and a PhD in Computer Science from the same University in 2008. Prior to joining King's College London, Lorenzo worked in the Information Security Group at Royal Holloway, University of London (Assistant Professor, 2012; Associate Professor, 2016; Full Professor, 2018), and held Post-Doctoral and Visiting Scholar positions at Vrije Universiteit Amsterdam (2010--2011), UC Santa Barbara (2008--2009), and Stony Brook University (2006--2008). His research builds on program analysis and machine learning to address threats against the security of computing systems. Lorenzo is Principal Investigator in a number of research projects primarily funded by the UK EPSRC, the EU, Royal Holloway, and McAfee. He received the USENIX WOOT Best Paper Award 2017 and publishes at & sits on the technical program committee of well-known international conferences, including USENIX Security, ACM CCS, NDSS, WWW, ACSAC, and RAID.

@conference {226323,
author = {Lorenzo Cavallaro},
title = {When the Magic Wears Off: Flaws in {ML} for Security Evaluations (and What to Do about It)},
year = {2019},
address = {Burlingame, CA},
publisher = {USENIX Association},
month = jan

Presentation Video