Finding Bugs Using Your Own Code: Detecting Functionally-similar yet Inconsistent Code

Authors: 

Mansour Ahmadi, Reza Mirzazade Farkhani, Ryan Williams, and Long Lu, Northeastern University

Abstract: 

Probabilistic classification has shown success in detecting known types of software bugs. However, the works following this approach tend to require a large amount of specimens to train their models. We present a new machine learning-based bug detection technique that does not require any external code or samples for training. Instead, our technique learns from the very codebase on which the bug detection is performed, and therefore, obviates the need for the cumbersome task of gathering and cleansing training samples (e.g., buggy code of certain kinds). The key idea behind our technique is a novel two-step clustering process applied on a given codebase. This clustering process identifies code snippets in a project that are functionally-similar yet appear in inconsistent forms. Such inconsistencies are found to cause a wide range of bugs, anything from missing checks to unsafe type conversions. Unlike previous works, our technique is generic and not specific to one type of inconsistency or bug. We prototyped our technique and evaluated it using 5 popular open source software, including QEMU and OpenSSL. With a minimal amount of manual analysis on the inconsistencies detected by our tool, we discovered 22 new unique bugs, despite the fact that many of these programs are constantly undergoing bug scans and new bugs in them are believed to be rare.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {263838,
author = {Mansour Ahmadi and Reza Mirzazade farkhani and Ryan Williams and Long Lu},
title = {Finding Bugs Using Your Own Code: Detecting Functionally-similar yet Inconsistent Code},
booktitle = {30th {USENIX} Security Symposium ({USENIX} Security 21)},
year = {2021},
isbn = {978-1-939133-24-3},
pages = {2025--2040},
url = {https://www.usenix.org/conference/usenixsecurity21/presentation/ahmadi},
publisher = {{USENIX} Association},
month = aug,
}

Presentation Video