AIFORE: Smart Fuzzing Based on Automatic Input Format Reverse Engineering

Authors: 

Ji Shi, {CAS-KLONAT, BKLONSPT}, Institute of Information Engineering, Chinese Academy of Sciences; Institute for Network Science and Cyberspace & BNRist, Tsinghua University; Zhongguancun Lab; Singular Security Lab, Huawei Technologies; School of Cyber Security, University of Chinese Academy of Sciences; Zhun Wang, Institute for Network Science and Cyberspace & BNRist, Tsinghua University; Zhongguancun Lab; Zhiyao Feng, Institute for Network Science and Cyberspace & BNRist, Tsinghua University; Zhongguancun Lab; EPFL; Yang Lan and Shisong Qin, Institute for Network Science and Cyberspace & BNRist, Tsinghua University; Zhongguancun Lab; Wei You, Renmin University of China; Wei Zou, {CAS-KLONAT, BKLONSPT}, Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences; Mathias Payer, EPFL; Chao Zhang, Institute for Network Science and Cyberspace & BNRist, Tsinghua University; Zhongguancun Lab

Abstract: 

Knowledge of a program’s input format is essential for effective input generation in fuzzing. Automated input format reverse engineering represents an attractive but challenging approach to learning the format. In this paper, we address several challenges of automated input format reverse engineering, and present a smart fuzzing solution AIFORE which makes full use of the reversed format and benefits from it. The structures and semantics of input fields are determined by the basic blocks (BBs) that process them rather than the input specification. Therefore, we first utilize byte-level taint analysis to recognize the input bytes processed by each BB, then identify indivisible input fields that are always processed together with a minimum cluster algorithm, and learn their types with a neural network model that characterizes the behavior of BBs. Lastly, we design a new power scheduling algorithm based on the inferred format knowledge to guide smart fuzzing. We implement a prototype of AIFORE and evaluate both the accuracy of format inference and the performance of fuzzing against state-of-the-art (SOTA) format reversing solutions and fuzzers. AIFORE significantly outperforms SOTA baselines on the accuracy of field boundary and type recognition. With AIFORE, we uncovered 20 bugs in 15 programs that were missed by other fuzzers.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {285497,
author = {Ji Shi and Zhun Wang and Zhiyao Feng and Yang Lan and Shisong Qin and Wei You and Wei Zou and Mathias Payer and Chao Zhang},
title = {{AIFORE}: Smart Fuzzing Based on Automatic Input Format Reverse Engineering},
booktitle = {32nd USENIX Security Symposium (USENIX Security 23)},
year = {2023},
isbn = {978-1-939133-37-3},
address = {Anaheim, CA},
pages = {4967--4984},
url = {https://www.usenix.org/conference/usenixsecurity23/presentation/shi-ji},
publisher = {USENIX Association},
month = aug
}

Presentation Video