Can a Deep Learning Model for One Architecture Be Used for Others? {Retargeted-Architecture} Binary Code Analysis

Junzhe Wang; Matthew Sharp; Chuxiong Wu; Qiang Zeng; Lannan Luo

Authors:

Junzhe Wang, George Mason University; Matthew Sharp, University of South Carolina; Chuxiong Wu, Qiang Zeng, and Lannan Luo, George Mason University

Abstract:

NLP-inspired deep learning for binary code analysis demonstrates notable performance. Considering the diverse Instruction Set Architectures (ISAs) on the market, it is important to be able to analyze code of various ISAs. However, training a deep learning model usually requires a large amount of data, which poses a challenge for certain ISAs such as PowerPC that suffer from the "data scarcity" issue. For instance, acquiring a large dataset of PowerPC malware proves to be challenging. Moreover, given a binary analysis task and multiple ISAs, it takes much time and effort (e.g., for data collection, labeling and cleaning, and parameter tuning) to train one model per ISA. We propose a new direction, retargeted-architecture binary code analysis, to handle the data scarcity issue and alleviate the per-ISA effort. Our idea is to transfer knowledge from one ISA to others—that is, a model, trained with rich data and much time and effort for one ISA, can perform prediction for others without any modification. We showcase the idea through two important tasks: malware detection and function similarity detection. An extensive evaluation involving four ISAs (x86, ARM, MIPS, and PowerPC) demonstrates the effectiveness of the approach and the high performance is interpreted.

Junzhe Wang, George Mason University

Matthew Sharp, University of South Carolina

Chuxiong Wu, George Mason University

Qiang Zeng, George Mason University

Lannan Luo, George Mason University

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {291215,
author = {Junzhe Wang and Matthew Sharp and Chuxiong Wu and Qiang Zeng and Lannan Luo},
title = {Can a Deep Learning Model for One Architecture Be Used for Others? {Retargeted-Architecture} Binary Code Analysis},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
isbn = {978-1-939133-37-3},
address = {Anaheim, CA},
pages = {7339--7356},
url = {https://www.usenix.org/conference/usenixsecurity23/presentation/wang-junzhe},
publisher = {USENIX Association},
month = aug
}

Download

Wang PDF

Can a Deep Learning Model for One Architecture Be Used for Others? Retargeted-Architecture Binary Code Analysis