FeatureSmith: Learning to Detect Malware by Mining the Security Literature

Website Maintenance Alert

Due to scheduled maintenance, the USENIX website will not be available on Saturday, April 13, from 12:00 am–12:30 am Pacific Daylight Time (UTC-7). We apologize for the inconvenience.

If you are trying to register for NSDI '24 or register for PEPR '24, please complete your registration before or after this time period.

Wednesday, February 1, 2017 - 1:30pm2:00pm

Tudor Dumitras, University of Maryland, College Park


Companies facing rampant attacks and data breaches have started turning to artificial intelligence techniques, such as machine learning, for security tasks. A machine learning classifier automatically learns models of malicious activity from a set of known-benign and known-malicious observations, without the need for a precise description of the activity prepared in advance. However, the effectiveness of these techniques primarily depends on the feature engineering process, which is usually a manual task based on human knowledge and intuition. Can we automate this process? Can we build an intelligent system that not only learns from examples, but can also help us build other intelligent systems?

We developed a system, called FeatureSmith, that engineers features for malware detectors by synthesizing the knowledge described in thousands of research papers. As a demonstration, we trained a machine learning classifier with automatically engineered features for detecting Android malware and we achieved a performance comparable to that of a state-of-the-art detector for Android malware, which uses manually engineered features. In addition, FeatureSmith can suggest informative features that are absent from the manually engineered set and can link the features generated to human-understandable concepts that describe malware behaviors.

Tudor Dumitras, University of Maryland, College Park

Tudor Dumitraș is an Assistant Professor in the Electrical and Computer Engineering Department at the University of Maryland, College Park. His research focuses on Big Data approaches to problems in system security and dependability. In his previous role at Symantec Research Labs, he built the Worldwide Intelligence Network Environment (WINE)—a platform for experimenting with Big Data techniques. He received an Honorable Mention in the NSA competition for the Best Scientific Cybersecurity Paper of 2012. He also received the 2011 A. G. Jordan Award from the ECE Department at Carnegie Mellon University, the 2009 John Vlissides Award from ACM SIGPLAN, and the Best Paper Award at ASP-DAC '03. Tudor holds a PhD degree from Carnegie Mellon University.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@conference {202485,
author = {Tudor Dumitras},
title = {{FeatureSmith}: Learning to Detect Malware by Mining the Security Literature},
year = {2017},
address = {Oakland, CA},
publisher = {USENIX Association},
month = jan

Presentation Video