TreeHuggr: Discovering Where Tree-based Classifiers are Vulnerable to Adversarial Attack

Monday, August 12, 2019 - 11:00 am11:30 am

Bobby Filar, Endgame


Tree-based classifiers like gradient-boosted decision trees (GBDTs) and random forests provide state-of-the-art performance in many information security tasks such as malware detection. Even while adversarial methods for evading deep learning classifiers abound, little research has been carried out against attacking tree-based classifiers. Mostly, this is due to tree-based models being non-differentiable, which significantly increases the cost of attacks. Research has shown attack transferability may be successful at evading tree-based classifiers, but those techniques do little to illuminate where models are brittle or weak.

We present TreeHuggr, an algorithm designed to analyze split points of each tree in an ensemble classifier to learn where a model might be most susceptible to an evasion attack. By determining where in the feature space there exists insufficient or conflicting evidence for a class label or where a decision boundary is wrinkled, we can not only better understand the attack space, but we can also more intuitively understand a model’s blind spots and increase interpretability. The key differentiator of TreeHuggr is a focus on where a model is most susceptible, in contrast to the common approach of crafting an evasive variant by perturbing an adversary-selected starting point.

This talk will provide an example-driven demonstration of TreeHuggr against the open-source EMBER dataset and malware model. We hope that TreeHuggr will highlight the potential defensive uses of adversarial research against tree-based classifiers and yield more insights into model interpretability and attack susceptibility.

Bobby Filar, Endgame

Bobby Filar is a the Director of Data Science at Endgame where he employs machine learning and natural language processing to drive cutting-edge detection and contextual understanding capabilities in Endgame’s endpoint protection platform. In the past year he has focused on applying machine learning against process event data to provide confidence and explainability metrics for malware alerts. Previously, Bobby has worked on a variety of machine learning problems focused on natural language understanding, geospatial analysis, and adversarial tasks in the information security domain.

@conference {238501,
author = {Bobby Filar},
title = {{TreeHuggr}: Discovering Where Tree-based Classifiers are Vulnerable to Adversarial Attack},
year = {2019},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = aug