{SoK}: Automated {TTP} Extraction from {CTI} Reports – Are We There Yet?

Marvin Büchel; Tommaso Paladini; Stefano Longari; Michele Carminati; Stefano Zanero; Hodaya Binyamini; Gal Engelberg; Dan Klein; Giancarlo Guizzardi; Marco Caselli; Andrea Continella; Maarten van Steen; Andreas Peter; Thijs van Ede

Marvin Büchel, Carl von Ossietzky Universität Oldenburg; Tommaso Paladini, Politecnico di Milano, NEC Laboratories Europe GmbH; Stefano Longari, Michele Carminati, and Stefano Zanero, Politecnico di Milano; Hodaya Binyamini, Gal Engelberg, and Dan Klein, Accenture Labs; Giancarlo Guizzardi, University of Twente; Marco Caselli, Siemens AG; Andrea Continella and Maarten van Steen, University of Twente; Andreas Peter, Carl von Ossietzky Universität Oldenburg; Thijs van Ede, University of Twente

Cyber Threat Intelligence (CTI) plays a critical role in sharing knowledge about new and evolving threats. With the increased prevalence and sophistication of threat actors, intelligence has expanded from simple indicators of compromise to extensive CTI reports describing high-level attack steps known as Tactics, Techniques and Procedures (TTPs). Such TTPs, often classified into the ontology of the MITRE ATT&CK framework, make CTI significantly more valuable, but also harder to interpret and automatically process. Natural Language Processing (NLP) makes it possible to automate large parts of the knowledge extraction from CTI reports; over 40 papers discuss approaches, ranging from named entity recognition over embedder models to generative large language models. Unfortunately, existing solutions are largely incomparable as they consider decisively different and constrained settings, rely on custom TTP ontologies, and use a multitude of custom, inaccessible CTI datasets. We take stock, systematize the knowledge in the field, and empirically evaluate existing approaches in a unified setting for fair comparisons. We gain several fundamental insights, including (1) the finding of a kind of performance limit that existing approaches seemingly cannot overcome as of yet, (2) that traditional NLP approaches (possibly counterintuitively) outperform modern embedder-based and generative approaches in realistic settings, and (3) that further research on understanding inherent ambiguities in TTP ontologies and on the creation of qualitative datasets is key to take a leap in the field.

Category:

Short Presentation

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {309688,
author = {Marvin B{\"u}chel and Tommaso Paladini and Stefano Longari and Michele Carminati and Stefano Zanero and Hodaya Binyamini and Gal Engelberg and Dan Klein and Giancarlo Guizzardi and Marco Caselli and Andrea Continella and Maarten van Steen and Andreas Peter and Thijs van Ede},
title = {{SoK}: Automated {TTP} Extraction from {CTI} Reports {\textendash} Are We There Yet?},
booktitle = {34th USENIX Security Symposium (USENIX Security 25)},
year = {2025},
isbn = {978-1-939133-52-6},
address = {Seattle, WA},
pages = {4621--4641},
url = {https://www.usenix.org/conference/usenixsecurity25/presentation/buechel},
publisher = {USENIX Association},
month = aug
}

SoK: Automated TTP Extraction from CTI Reports – Are We There Yet?

Open Access Media

Presentation Video