Tudor Dumitras, University of Maryland, College Park
The rate at which software vulnerabilities are discovered is growing: the National Vulnerability Database includes over 100,000 vulnerabilities, and 10% of these entries were added in the last year. Very few of these vulnerabilities are exploited in real-world attacks, yet the exploits can compromise millions of hosts around the world and can disrupt businesses and critical services.
This talk will discuss what we have learned about vulnerability exploitation by analyzing data from 10 million hosts. These hosts, used by real people around the world and targeted by real attackers, give us an opportunity to quantify the impact of software vulnerabilities on a global scale. Our measurements also allow us to infer statistically which vulnerabilities are likely to be exploited in the wild—before finding the corresponding exploits.
We show that the growing rate of vulnerability discovery does not mean that software is becoming more insecure; in fact, the fraction of vulnerabilities that are exploited follows a decreasing trend. At the same time, popular vulnerability metrics, such as the CVSS score, have a low correlation with the vulnerabilities that are ultimately exploited in the real world. It is difficult to guess why hackers exploit some vulnerabilities and not others, because this decision is influenced by a variety of socio-technical factors. However, we can combine features derived from the technical characteristics of a vulnerability, such as its CVSS score, with features extracted from social media, which reflect how information about the vulnerability spreads among hackers, security researchers and system administrators. Additionally, we can take into account variations in the rates at which vulnerable hosts are patched, after the patch becomes available. By combining these factors into predictive models, we can determine which vulnerabilities present a higher risk of exploitation, and, for some vulnerabilities, we can infer the existence of zero-day exploits on the day of disclosure.
Our predictive models are the result of five years of academic research, and they represent a step toward answering the question "What are the odds that you will get hacked tomorrow?" Along with recent advances on predicting other types of security incidents, these techniques help us assess objectively the impact of various defensive technologies on security in the real world. Such predictive models allow companies to determine their biggest risks and the best mitigations by using data, rather than expert opinions. They also provide evidence for cyber policymaking, and they can be applied to risk modeling in cyber insurance.
Tudor Dumitraș is an Assistant Professor in the Electrical & Computer Engineering Department at the University of Maryland, College Park. His research focuses on data-driven security: he studies real-world adversaries empirically, he builds machine learning systems for detecting attacks and predicting security incidents, and he investigates the security of machine learning in adversarial environments. In his previous role at Symantec Research Labs he built the Worldwide Intelligence Network Environment (WINE) - a data analytics platform for security research. His work on the effectiveness of certificate revocations in the Web PKI was featured in the Research Highlights of the Communications of the ACM in 2018, and his measurement of the duration and prevalence of zero-day attacks received an Honorable Mention in the NSA competition for the Best Scientific Cybersecurity Paper of 2012. He also received the 2011 A. G. Jordan Award from the ECE Department at Carnegie Mellon University, the 2009 John Vlissides Award from ACM SIGPLAN, and the Best Paper Award at ASP-DAC'03. Tudor holds a Ph.D. degree from Carnegie Mellon University.