Spotting the Differences: Quirks of Machine Learning (in) Security

Fabrício Ceschin

Tuesday, January 24, 2023 - 5:10 pm–5:40 pm

Fabrício Ceschin, Federal University of Paraná (UFPR), Brazil

Machine Learning (ML) has been widely applied to cybersecurity and is currently considered state-of-the-art for solving many open issues in that field. However, it is challenging to evaluate how good the produced solutions are, since security challenges may not appear in other areas, as security problems could incur infeasible solutions for real-world applications. For instance, a phishing detection model that does not consider a non-stationary distribution would not work given that 68% of phishing emails blocked by Gmail are different daily. In this talk, I will discuss some of the challenges of applying ML to cybersecurity, which include: (i) dataset problems, such as dataset definition, where defining the right size is key to creating a representative model of the task being performed, and class imbalance, where the distribution between classes differs substantially; (ii) adversarial machine learning and concept drift/evolution, where attackers constantly develop adversarial samples to avoid detection leading to changes in the concept in the data, and turning defense solutions obsolete due to the volatility of security data; and (iii) evaluation problems, such as delayed labels, where new data do not have ground-truth labels available right after collection, producing a gap between the data collection, their labeling process, and models training/testing. My goal is to point directions to future cybersecurity researchers and practitioners applying ML to their problems. Finally, for each challenge described, I will show how existing solutions may fail under certain circumstances, and propose possible solutions to fix them when appropriate.

Fabrício is a Ph.D. student (Federal University of Paraná, Brazil), Master in Computer Science (Federal University of Paraná, Brazil, 2017), and Computer Scientist (Federal University of Paraná, Brazil, 2015). His research interests include machine learning, adversarial machine learning, and data streams applied to cyber security. Fabrício published papers in top venues (IEEE Security & Privacy, IEEE TBIOM, ACM ESWA, and others) and already reviewed papers for multiple venues (USENIX Security & Privacy 2022, IEEE Security & Privacy, DIMVA, ARES, ECML, and others). He was awarded the Google Research Awards for the Latin America program in 2017. Fabrício also received the USENIX Enigma 2019 student travel grant and won the Machine Learning Security Evasion Competition (MLSEC) twice (2020 and 2021).

Connect:

@fabriciojoc

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@conference {285613,
author = {Fabr{\'\i}cio Ceschin},
title = {Spotting the Differences: Quirks of Machine Learning (in) Security},
year = {2023},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jan
}

Download

View the slides

Spotting the Differences: Quirks of Machine Learning (in) Security

Open Access Media

Presentation Video