ScAINet '19 Conference Program

Monday, August 12

8:20 am–9:20 am

Continental Breakfast

Grand Ballroom Foyer

9:20 am–9:30 am

Opening Remarks

Program Co-Chairs: Rachel Greenstadt, New York University, and Aleatha Parker-Wood, Humu

9:30 am–10:30 am

Keynote Address

Recent Advances in Adversarial Machine Learning

Monday, 9:30 am–10:30 am

Nicolas Carlini, Research Scientist, Google Research

Adversarial machine learning has progressed rapidly over the past few years, with currently over 1,000 papers on this topic and growing at a rate of over a paper a day. In this talk, I survey some of the most interesting recent results, ranging from practical applications of adversarial machine learning to fundamental research investigating why adversarial examples exist in the first place. I conclude with a selection of future research directions that would advance the body of knowledge in this important field.

Nicholas Carlini is a research scientist at Google Brain. He analyzes the security and privacy of machine learning, for which he has received best paper awards at IEEE S&P and ICML. He graduated with his PhD from the the University of California, Berkeley in 2018.

10:30 am–11:00 am

Break with Refreshments

Grand Ballroom Foyer

11:00 am–12:30 pm

Adversaries of All Sorts

TreeHuggr: Discovering Where Tree-based Classifiers are Vulnerable to Adversarial Attack

Monday, 11:00 am–11:30 am

Bobby Filar, Endgame

Tree-based classifiers like gradient-boosted decision trees (GBDTs) and random forests provide state-of-the-art performance in many information security tasks such as malware detection. Even while adversarial methods for evading deep learning classifiers abound, little research has been carried out against attacking tree-based classifiers. Mostly, this is due to tree-based models being non-differentiable, which significantly increases the cost of attacks. Research has shown attack transferability may be successful at evading tree-based classifiers, but those techniques do little to illuminate where models are brittle or weak.

We present TreeHuggr, an algorithm designed to analyze split points of each tree in an ensemble classifier to learn where a model might be most susceptible to an evasion attack. By determining where in the feature space there exists insufficient or conflicting evidence for a class label or where a decision boundary is wrinkled, we can not only better understand the attack space, but we can also more intuitively understand a model’s blind spots and increase interpretability. The key differentiator of TreeHuggr is a focus on where a model is most susceptible, in contrast to the common approach of crafting an evasive variant by perturbing an adversary-selected starting point.

This talk will provide an example-driven demonstration of TreeHuggr against the open-source EMBER dataset and malware model. We hope that TreeHuggr will highlight the potential defensive uses of adversarial research against tree-based classifiers and yield more insights into model interpretability and attack susceptibility.

Bobby Filar is a the Director of Data Science at Endgame where he employs machine learning and natural language processing to drive cutting-edge detection and contextual understanding capabilities in Endgame’s endpoint protection platform. In the past year he has focused on applying machine learning against process event data to provide confidence and explainability metrics for malware alerts. Previously, Bobby has worked on a variety of machine learning problems focused on natural language understanding, geospatial analysis, and adversarial tasks in the information security domain.

Connect:

@filar

Verifiably Robust Machine Learning for Security

Monday, 11:30 am–12:00 pm

Yizheng Chen, Columbia University

Machine learning has shown impressive results in detecting security events such as malware, spam, phishing, and many types of online fraud. Though almost perfect accuracies are demonstrated in many research works, machine learning models are highly vulnerable to poisoning and evasion attacks. Such weaknesses severely limit the reliable application of machine learning in security-relevant applications.

Building robust machine learning models has always been a cat-and-mouse game, with new attacks constantly devised to defeat the defenses. Recently, a new paradigm has emerged to train verifiably robust machine learning models for image classification tasks. To end the cat-and-mouse game, verifiably robust training provides the ML model with robustness properties that can be formally verified against any possible bounded attackers.

Verifiably robust training minimizes the over-estimated attack success rate, utilizing the sound over-approximation method. Due to fundamental differences between ML models and traditional software, new sound over-approximation methods have been proposed to provide proofs for the robustness properties. In particular, soundness means that if no successful attacks can be found by the analysis, there indeed doesn’t exist any. If we can apply the training technique for security-relevant classifiers, we can train ML models with robustness properties on the worst-case behavior, even if the adversaries adapt the attacks after knowing the defense.

In this talk, I will discuss the following:

What is verifiably robust training?
What are the main challenges in applying verifiably robust training technique to security applications?

Yizheng Chen is a Postdoctoral Researcher at Columbia University. She received her Ph.D. degree in Computer Science from Georgia Institute of Technology. She is interested in designing and implementing secure machine learning systems, and applying machine learning and graphical models to solve security problems.

Automatically Learning How to Evade Censorship

Monday, 12:00 pm–12:30 pm

Dave Levin, University of Maryland

Researchers and censoring regimes have long engaged in a cat-and-mouse game, leading to increasingly sophisticated Internet-scale censorship techniques and methods to evade them. This talk will introduce a drastic departure from the previously manual evade-detect cycle: applying artificial intelligence techniques to automate the discovery of censorship evasion strategies. We will demonstrate that, by training AI against live censors, one can glean new insights into how censorship works around the world, and how to circumvent it. After a brief demonstration of a proof of concept involving genetic algorithms, the bulk of the talk will focus on future directions and open questions, including: Does automating the evade/detect cycle ultimately benefit the censor? What protocols can be automatically learned? And, can training be collected from many users and vantage points?

Dave Levin is an Assistant Professor of Computer Science at the University of Maryland. His research centers on network security, measurement, and building secure systems. He has received multiple best paper awards, the IRTF Applied Networking Research Prize, the IEEE Cybersecurity Award for Innovation, and a Microsoft Live Labs Fellowship. He is also Co-Chair of UMD’s CS Honors program, and the founder of Breakerspace, a research lab for undergraduate students.

Connect:

@DistributedDave

12:30 pm–2:00 pm

Monday Luncheon

Terra Courtyard

2:00 pm–3:30 pm

Privacy

Keynote Address: PETs, POTs, and Pitfalls: Rethinking the Protection of Users against Machine Learning

Monday, 2:00 pm–3:00 pm

Carmela Troncoso, EPFL

Available Media

In a machine-learning dominated world, users' digital interactions are monitored, and scrutinized in order to enhance services. These enhancements, however, may not always have the benefit and preferences of the users as a primary goal. Machine learning, for instance, can be used to learn users' demographics and interests in order to fuel targeted advertisements, regardless of people's privacy rights; or to learn bank customers' behavioral patterns to optimize the monetary benefits of loans, with disregard for discrimination. In other words, machine learning models may be adversarial in their goals and operation. Therefore, adversarial machine learning techniques that are usually considered undesirable can be turned into robust protection mechanisms for users. In this talk we discuss two protective uses of adversarial machine learning, and challenges for protection arising from the biases implicit in many machine learning models.

Carmela Troncoso is an Assistant Professor at EPFL where she leads the Security and Privacy Engineering (SPRING) Laboratory. Her research focuses on privacy protection, with particular focus on developing systematic means to build privacy-preserving systems and evaluate these system's information leakage.

Panel: Privacy as a Top-level ML System Concern

Monday, 3:00 pm–3:30 pm

3:30 pm–4:00 pm

Break with Refreshments

Grand Ballroom Foyer

4:00 pm–5:30 pm

DNS and the Industry/Academia Divide

Dns2Vec: Exploring Internet Domain Names through Deep Learning

Monday, 4:00 pm–4:30 pm

Amit Arora, Hughes Network Systems

Available Media

The concept of vector space embeddings was first applied in the area of Natural Language Processing (NLP) but has since been applied to several domains wherever there is an aspect of semantic similarity. Here we apply vector space embeddings to Internet Domain Names. We call this Dns2Vec. A corpus of Domain Name Server (DNS) queries was created from traffic from a large Internet Service Provider (ISP). A skipgram word2vec model was used to create embeddings for domain names. The objective was to find similar domains and examine if domains in the same category (news, shopping etc.) cluster together. The embeddings could then be used for several traffic engineering application such as shaping, content filtering, prioritization and also for predicting browsing sequence and anomaly detection. The results were confirmed by manually examining similar domains returned by the model, visualizing clusters using t-SNE and also using a 3rd party web categorization service (Symantec K9).

Data scientist at Hughes Network Systems. Graduated from M.S. in Data Science program from Georgetown University, December 2018. Love working with data, R and Python, Machine Learning, AutoML, Apache Spark, Flink, Deep Learning, NLP, Shiny, Elasticsearch, AWS, GCP, Datarbricks. Have a flair for teaching.

Before transitioning to a full time data scientist role I had more than 18 years of work experience in Satellite Networking domain. Have extensively worked on satellite systems, with direct work experience on satellite modems as well as hub side gateways. Worked on several key technologies related to IPv6, IMS, traffic acceleration, traffic shaping, encryption, routing, layer 2 protocols, FIPS 140-2 certification, diagnostics etc.

Connect:

@aarora79

DNS Homographs Detection in the Wild

Monday, 4:30 pm–5:00 pm

Femi Olumofin and Chhaya Choudhary, Infoblox

Since early 2000 when internationalized domain name (IDN) gained traction, people have had more choices on the characters to use for creating Internet domain names. Extending character choices beyond ASCII to Unicode provides the needed coverage for most of the world's writing systems. Unfortunately, the IDN mechanism also put Internet users at risk of homograph attacks as many Unicode characters have strikingly similar or close visual appearance to ASCII characters. For example, through the clever choices of Unicode characters, anyone can create an "infoblox.com" domain, which looks indistinguishable from the legitimate ASCII-only "infoblox.com". The former domain is called a homograph or homoglyph, and the latter a target. The homograph in the example is using the Cyrillic small letter "o" instead of the ASCII "o". Attackers exploit such visual ambiguity or semblance existing with many Unicode characters to create homographs that impersonate priced targets. Homograph domains damage the reputation of targets and pose a threat to users that visit them. Moreover, these attacks can be employed in various types of phishing scams to steal sensitive information or to gain access to protected resources.

In this talk, we will introduce DNS homograph attacks and provide some highlights of relevant background work to detect them. We will then describe how we trained and fielded a machine learning classifier for homographs detection, and share examples of homographs caught in the wild over several months of passive DNS data.

Femi Olumofin is currently a senior member of the data science and analytics team at Infoblox in the San Francisco Bay Area. He has made contributions to research and development in the areas of privacy enhancing technologies, security, applied cryptography, big data analytics, and machine learning. He holds a Ph.D. in Computer Science from the University of Waterloo in Canada.

Connect:

@femiolumofin

Chhaya Choudhary is currently working as a Data Scientist at Infoblox, Tacoma. She recently graduated with Master's degree in Computer Science from the University of Washington. She has worked on solving challenging data problems involving malware detection and classification using Machine Learning and Deep Learning techniques. She has multiple accepted publications in the field of cybersecurity using AI/ML. Her masters thesis was about evaluating state-of-the-art DGA classifiers against adversarial examples using autoencoders and Generative Adversarial Networks.

Connect:

@chhaya_UW

Panel: Adversarial AI: Perspectives from Academia and Industry

Monday, 5:00 pm–5:30 pm

Moderator: Sadia Afroz, International Computer Science Institute (ICSI)

Panelists: Rajarshi Gupta, Avast Software; Carmela Troncoso, EPFL

Rajarshi Gupta is the Head of AI at Avast Software, one of the largest consumer security companies in the world. He has a Ph.D. in EECS from UC Berkeley and has built a unique expertise at the intersection of artificial intelligence, cybersecurity, and networking. Prior to joining Avast, Rajarshi worked for many years at Qualcomm Research, where he created 'Snapdragon Smart Protect', the first-ever product to achieve On-Device Machine Learning for Security. Rajarshi loves to solve innovative problems and has authored over 200 issued U.S. Patents.

Carmela Troncoso is an Assistant Professor at EPFL where she leads the Security and Privacy Engineering (SPRING) Laboratory. Her research focuses on privacy protection, with particular focus on developing systematic means to build privacy-preserving systems and evaluate these system's information leakage.

5:30 pm–5:35 pm

Closing Remarks

Program Co-Chairs: Rachel Greenstadt, New York University, and Aleatha Parker-Wood, Humu

5:45 pm–6:45 pm

Monday Happy Hour

Terra Courtyard

Sponsored by Carnegie Mellon University Privacy Engineering
Mingle with other attendees while enjoying snacks and beverages. Attendees of all co-located events taking place on Monday are welcome.