Xingan Gao, Xiaobing Sun, and Sicong Cao, Yangzhou University; Kaifeng Huang, Tongji University; Di Wu, University of Southern Queensland; Xiaolei Liu, China Academy of Engineering Physics; Xingwei Lin, Zhejiang University; Yang Xiang, Swinburne University of Technology
Malicious package detection has become a critical task in ensuring the security and stability of the PyPI community. Existing detection approaches have focused on advancing model selection, evolving from traditional machine learning (ML) models to large language models (LLMs). However, as model complexity increases, the time consumption also increases, which raises the question that can lightweight model achieve effective detection? Through empirical research, we demonstrate that collecting a sufficiently comprehensive feature set enables even traditional ML-models to achieve outstanding performance. But, traditional ML-models rely on manually pre-defined feature set and lack of explainability to malicious packages. Thereforce, we propose a novel approach MalGuard based on social network graphs to detect malicious packages in five traditional ML-models. To overcome this challenge, we leverage graph centrality analysis to extract sensitive APIs automatically to replace the hand-crafted features. To understand the sensitive APIs, we further refine the feature set using LLM and integrate the LIME(Local Interpretable Model-agnostic Explanations) algorithm with ML-models to provide explanations for malicious packages. We evaluated MalGuard against five SOTA baselines with the same settings. Experimental results show that our proposed MalGuard, improves precision by 0.5%-33.2% and recall by 1.8%-22.1%. With MalGuard, we successfully identified 95 previously unknown malicious packages from a pool of 51,479 newly-uploaded packages over a four-week period, and 73 out of them have been removed by the PyPI official.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Xingan Gao and Xiaobing Sun and Sicong Cao and Kaifeng Huang and Di Wu and Xiaolei Liu and Xingwei Lin and Yang Xiang},
title = {{MalGuard}: Towards {Real-Time}, Accurate, and Actionable Detection of Malicious Packages in {PyPI} Ecosystem},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {4741--4758},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/gao-xingan},
publisher = {USENIX Association},
month = aug
}
