LASER 2017 Workshop Program

Wednesday, October 18, 2017

08:30–09:00

Registration and Breakfast

09:00–09:30

Welcome and Introductions

09:30–11:00

Session I: Malware

Session Chair: Brendan Dolan-Gavitt, NYU

Understanding Malware’s Network Behaviors using Fantasm

Xiyue Deng, Hao Shi, and Jelena Mirkovic, USC/Information Sciences Institute

Available Media

Background: There is very little data about how often contemporary malware communicates with the Internet and how essential this communication is for malware’s functionality.

Aim: We aim to quantify what fraction of contemporary malware samples are environment-sensitive and will exhibit very few behaviors when analyzed under full containment. We then seek to understand the purpose of the malware’s use of communication channel and if malware communication patterns could be used to understand its purpose.

Method. We analyze malware communication behavior by running contemporary malware samples on bare-metal machines in the DeterLab testbed, either in full containment or with some limited connectivity, and recording and analyzing all their network traffic. We carefully choose which communication to allow, and we monitor all connections that are let into the Internet. This way we can guarantee safety to Internet hosts, while exposing interesting malware behaviors that do not show under full containment.

Results. We find that 58% of samples exhibit some network activity within the first five minutes of running. We further find that 78% of these samples exhibit more network behaviors when ran under our limited containment, than when ran under full containment, which means that 78% of samples are environment-sensitive. Most common communication patterns involve DNS, ICMP ECHO and HTTP traffic toward mostly nonpublic destinations. Likely purpose of this traffic is botnet command and control. We further show that malware’s network behaviors can be used to determine its purpose with 85–89% accuracy.

Conclusions. Ability to communicate with outside hosts seems to be essential to contemporary malware. This calls for better design of malware analysis environments, which enable safe and controlled communication to expose more interesting malware behaviors.

Open-source Measurement of Fast-flux Networks While Considering Domain-name Parking

Leigh B. Metcalf, Dan Ruef, and Jonathan M. Spring, Carnegie Mellon University

Available Media

Background: Fast-flux is a technique malicious actors use for resilient malware communications. In this paper, domain parking is the practice of assigning a nonsense location to an unused fully-qualified domain name (FQDN) to keep it ready for “live” use. Many papers use “parking” to mean typosquatting for ad revenue. However, we use the original meaning, which was relevant because it is a potentially confounding behavior for detection of fast-flux. Internet-wide fast-flux networks and the extent to which domain parking confounds fast-flux detection have not been publicly measured at scale.

Aim: Demonstrate a repeatable method for opensource measurement of fast-flux and domain parking, and measure representative trends over 5 years. Method: Our data source is a large passive-DNS collection. We use an open-source implementation that identifies suspicious associations between FQDNs, IP addresses, and ASNs as graphs. We detect parking via a simple time-series of whether a FQDN advertises itself on IETF-reserved private IP space and public IP space alternately. Whitelisting domains that use private IP space for encoding non-DNS responses (e.g. blacklist distributors) is necessary.

Results: Fast-flux is common; usual daily values are 10M IP addresses and 20M FQDNs. Domain parking, in our sense, is uncommon (94,000 unique FQDNs total) and does not interfere with fastflux detection. Our open-source tool works well at internet-scale.

Discussion: Real-time detection of fast-flux networks could help defenders better interrupt them. With our implementation, a resolver could potentially block name resolutions that would add to a known flux network if completed, preventing even the first connection. Parking is a poor indicator of malicious activity.

11:00–11:30

Break

11:30–12:30

12:30–13:30

Lunch

13:30–15:00

Panel Discussion

Methodological Issues with IoT Experimentation

Moderator: Fanny Lalonde Lévesque, Ecole Polytechnique de Montreal

Panelists: Gabriela Ciocarlie, SRI International; Ryan Goodfellow, USC/ISI; Tim Polk, National Institute of Standards and Technology (NIST)

15:00–15:30

Break

15:30–17:00

Session II: Passwords

Session Chair: David Balenson, SRI International

Lessons Learned from Evaluating Eight Password Nudges in the Wild

Karen Renaud, Abertay University; Verena Zimmerman, Technische Universitåt Darmstadt; Joseph Maguire and Steve Draper, University of Glasgow

Available Media

Background. The tension between security and convenience, when creating passwords, is well established. It is a tension that often leads users to create poor passwords. For security designers, three mitigation strategies exist: issuing passwords, mandating minimum strength levels or encouraging better passwords. The first strategy prompts recording, the second reuse, but the third merits further investigation. It seemed promising to explore whether users could be subtly nudged towards stronger passwords.

Aim. The aim of the study was to investigate the influence of visual nudges on self-chosen password length and/or strength.

Method. A university application, enabling students to check course dates and review grades, was used to support two consecutive empirical studies over the course of two academic years. In total, 497 and 776 participants, respectively, were randomly assigned either to a control or an experimental group. Whereas the control group received no intervention, the experimental groups were presented with different visual nudges on the registration page of the web application whenever passwords were created. The experimental groups’ password strengths and lengths were then compared that of the control group.

Results. No impact of the visual nudges could be detected, neither in terms of password strength nor length. The ordinal score metric used to calculate password strength led to a decrease in variance and test power, so that the inability to detect an effect size does not definitively indicate that such an effect does not exist.

Conclusion. We cannot conclude that the nudges had no effect on password strength. It might well be that an actual effect was not detected due to the experimental design choices. Another possible explanation for our result is that password choice is influenced by the user’s task, cognitive budget, goals and pre-existing routines. A simple visual nudge might not have the power to overcome these forces. Our lessons learned therefore recommend the use of a richer password strength quantification measure, and the acknowledgement of the user’s context, in future studies.

An Empirical Investigation of Security Fatigue: The Case of Password Choice after Solving a CAPTCHA

Kovila P.L. Coopamootoo and Thomas Groß, Newcastle University; M. Faizal R. Pratama, University of Derby

Available Media

Background. User fatigue or overwhelm in current security tasks has been called security fatigue by the research community. However, security fatigue can also impact subsequent tasks. For example, while the CAPTCHA is a widespread security measure that aims to separate humans from bots [26], it is also known to be difficult for humans. Yet, to-date it is not known how solving a CAPTCHA influences other subsequent tasks.

Aim. We investigate users’ password choice after a CAPTCHA challenge.

Method. We conduct a between-subject lab experiment. Three groups of 66 participants were each asked to generate a password. Two groups were given a CAPTCHA to solve prior to password choice, the third group was not. Password strength was measured and compared across groups.

Results. We found a significant difference in password strength across conditions, with p = :002, corresponding to a large effect size of f = :42. We found that solving a text- or picture-CAPTCHA results in significantly poorer password choice than not solving a CAPTCHA.

Conclusions. We contribute a first known empirical study investigating the impact of a CAPTCHA on password choice and of designing security tasks in a sequence. It raises questions on the usability, security fatigue and overall system security achieved when password choice follows another effortful task or is paired with a security task.

17:00

Adjourn

18:00

Workshop Reception

Thursday, October 19, 2017

08:30–09:00

Registration and Breakfast

10:00–10:30

Break

10:30–12:00

Session III: Crypto

Session Chair: Kovila Coopamootoo, Newcastle University

Dead on Arrival: Recovering from Fatal Flaws in Email Encryption Tools

Juan Ramón Ponce Mauriés, University College London; Kat Krol, University of Cambridge; Simon Parkin, Ruba Abu-Salma, and M. Angela Sasse, University College London

Available Media

Background. Since Whitten and Tygar’s seminal study of PGP 5.0 in 1999, there have been continuing efforts to produce email encryption tools for adoption by a wider user base, where these efforts vary in how well they consider the usability and utility needs of prospective users.

Aim. We conducted a study aiming to assess the user experience of two open-source encryption software tools—Enigmail and Mailvelope.

Method. We carried out a three-part user study (installation, home use, and debrief) with two groups of users using either Enigmail or Mailvelope. Users had access to help during installation (installation guide and experimenter with domain-specific knowledge), and were set a primary task of organising a mock flash mob using encrypted emails in the course of a week.

Results. Participants struggled to install the tools—they would not have been able to complete installation without help. Even with help, setup time was around 40 minutes. Participants using Mailvelope failed to encrypt their initial emails due to usability problems. Participants said they were unlikely to continue using the tools after the study, indicating that their creators must also consider utility.

Conclusions. Through our mixed study approach, we conclude that Mailvelope and Enigmail had too many software quality and usability issues to be adopted by mainstream users. Methodologically, the study made us rethink the role of the experimenter as that of a helper assisting novice users with setting up a demanding technology.

The Impacts of Representational Fluency on Cognitive Processing of Cryptography Concepts

Joseph Beckman, Sumra Bari, Yingjie Chen, Melissa Dark, and Baijian Yang, Purdue University

Available Media

fMRI presents a new measurement tool for the measurement of cognitive processing. fMRI analysis has been used in neuroscience to determine where cognitive processing takes place when people are exposed to environmental stimuli and has been used to determine where students and experts process basic mathematical functions. This research sought to understand where cryptography was processed in the brain, how representational translation impacts cognitive processing, and how instruction focused on teaching representational fluency in cryptography concepts impacts cognitive processing of cryptography. Subjects were given a multiple-choice pretest, instructed during the semester in the concepts of interest to this research, given a multiple-choice post-test, then subjected to the fMRI scan while prompted to process these concepts. Results of the study show that cryptography is processed in areas indicative of the representational forms in which they were presented, as well as engaging the executive processing areas of the brain. For example, cryptography presented visually was processed in the brain in similar areas as other concepts presented visually, but also engaged the areas of the brain that organize and process complex concepts. However, the research team did not find significant results related to the cognitive processing of translating among representations, nor did we find significant changes in cognitive processing of cryptography for topics in which the focus of instruction was teaching representational fluency. Pre and post test results showed subject performed better on concepts instructed using representational fluency against concepts instructed without a focus on representational fluency, but the difference was not significant at α=.05.

12:00–13:15

Lunch

13:15–14:45

Session IV: Behavioral Security

Session Chair: Ryan Goodfellow, USC/ISI

Self-Protective Behaviors Over Public WiFi Networks

David Maimon, Michael Becker, Sushant Patil, and Jonathan Katz, University of Maryland

Available Media

The proliferation of public WiFi networks in small businesses, academic institutions, and municipalities allows users to access the Internet from various public locations. Unfortunately, the nature of these networks pose serious risks to users’ security and privacy. As a result, public WiFi users are encouraged to adopt a range of self-protective behaviors to prevent their potential online victimization. This paper explores the prevalence of one such behavior—avoidance of sensitive websites—among public WiFi network users. Moreover, we investigate whether computer users' adoption of an online avoidance strategy depends on their level of uncertainty regarding the security practices of the WiFi network they login to. To answer these questions, we analyze data collected using two phases of field observations: (1) baseline assessment and (2) introduction of a private (honeypot) WiFi network. Phase one baseline data were collected using packet-sniffing of 24 public WiFi networks in the DC metropolitan area. Phase two data were obtained through introducing a honeypot WiFi network to 109 locations around the DC Metropolitan area and an implementation of a quasi-experimental one-group-post-test-only research design. Findings reveal that although most WiFi users avoid accessing banking websites using established public WiFi networks, they still use these networks to access social networks, email, and other websites that handle sensitive information. Nevertheless, when logged in to a WiFi network that has some uncertainty regarding the legitimacy and security practices of its operator, WiFi network users tend to avoid most websites that handle sensitive information.

Measuring the Success of Context-Aware Security Behaviour Surveys

Ingolf Becker, Simon Parkin, and M. Angela Sasse, University College London

Available Media

Background. We reflect on a methodology for developing scenario-based security behaviour surveys that evolved through deployment in two large partner organisations (A & B). In each organisation, scenarios are grounded in workplace tensions between security and employees’ productive tasks. These tensions are drawn from prior interviews in the organisation, rather than using established but generic questionnaires. Survey responses allow clustering of participants according to predefined groups.

Aim. We aim to establish the usefulness of framing survey questions around active security controls and problems experienced by employees, by assessing the validity of the clustering. We introduce measures for the appropriateness of the survey scenarios for each organisation and the quality of candidate answer options. We use these scores to articulate the methodological improvements between the two surveys.

Method. We develop a methodology to verify the clustering of participants, where 516 (A) and 195 (B) free-text responses are coded by two annotators. Interannotator metrics are adopted to identify agreement. Further, we analyse 5196 (A) and 1824 (B) appropriateness and severity scores to measure the appropriateness and quality of the questions.

Results. Participants rank questions in B as more appropriate than in A, although the variations in the severity of the answer options available to participants is higher in B than in A. We find that the scenarios presented in B are more recognisable to the participants, suggesting that the survey design has indeed improved. The annotators mostly agree strongly on their codings with Krippendorff’s a > 0:7. A number of clusterings should be questioned, although a improves for reliable questionsby 0:15 from A to B.

Conclusions. To be able to draw valid conclusions from survey responses, the train of analysis needs to be verifiable. Our approach allows us to further validate the clustering of responses by utilising free-text responses. Further, we establish the relevance and appropriateness of the scenarios for individual organisations. While much prior research draws on survey instruments from research before it, this is then often applied in a different context; in these cases adding metrics of appropriateness and severity to the survey design can ensure that results relate to the security experiences of employees.

14:45–15:15

Closing Remarks

15:15

Adjourn