Papers and Proceedings
The full Proceedings published by USENIX for the symposium are available for download below. Individual papers can also be downloaded from their respective presentation pages. Copyright to the individual works is retained by the author[s].
Proceedings Front Matter
Proceedings Cover |
Title Page and List of Organizers |
Message from the Program Co-Chairs |
Table of Contents
Full Proceedings PDFs
USENIX Security '24 Full Proceedings (PDF, 717.5 MB)
USENIX Security '24 Proceedings Interior (PDF, 714.3 MB, best for mobile devices)
USENIX Security '24 Errata Slip #1 (PDF)
Wednesday, August 14
7:00 am–8:45 am
Continental Breakfast
Grand Ballroom Foyer
8:45 am–9:15 am
Opening Remarks and Awards
Program Co-Chairs: Davide Balzarotti, Eurecom; Wenyuan Xu, Zhejiang University
Salon EFGH
9:15 am–10:15 am
Keynote Address
A World Where We Trust Hard-Won Lessons in Security Research, Technology, and People
David Brumley, ForAllSecure and CMU
Great ideas shouldn't remain confined to papers; they should transform the world. What does it take for our research to make a real-world impact? Are there guiding principles, and do they influence how we conduct fundamental research?
In this keynote, I will share my journey of understanding the principles that bridge the gap between fundamental research and the practical implementation of safer software and systems. Through real-world examples and case studies, I will discuss how I learned to replace "it's more secure" with compelling, actionable arguments. I will delve into adoption challenges that unveiled research gems and share candid moments when my academic hubris was dismantled by industry realities.
This journey has led me to identify four key principles that, I believe, are crucial for ensuring that innovative ideas transition successfully to the broader community and not get stuck as just a great research paper. Join me to explore these principles and how I believe they can help us all build a world with computers we trust.
David Brumley, ForAllSecure and CMU
David Brumley is the CEO of ForAllSecure and a full professor at Carnegie Mellon University. His research focuses on novel program analysis and verification techniques that prove the presence of bugs and vulnerabilities. He has published numerous academic papers, won several test-of-time and achievement awards, competed and won the DARPA Cyber Grand Challenge, and holds a black badge.
10:15 am–11:15 am
Coffee and Tea Break
Grand Ballroom Foyer
11:15 am–12:15 pm
User Studies I: Social Media Platforms
"I feel physically safe but not politically safe": Understanding the Digital Threats and Safety Practices of OnlyFans Creators
Ananta Soneji, Arizona State University; Vaughn Hamilton, Max Planck Institute for Software Systems; Adam Doupé, Arizona State University; Allison McDonald, Boston University; Elissa M. Redmiles, Georgetown University
OnlyFans is a subscription-based social media platform with over 1.5 million content creators and 150 million users worldwide. OnlyFans creators primarily produce intimate content for sale on the platform. As such, they are distinctly positioned as content creators and sex workers. Through a qualitative interview study with OnlyFans creators (n=43), building on an existing framework of online hate and harassment, we shed light on the nuanced threats they face and their safety practices. Additionally, we examine the impact of factors such as stigma, prominence, and platform policies on shaping the threat landscape for OnlyFans creators and detail the preemptive practices they undertake to protect themselves. Leveraging these results, we synthesize opportunities to address the challenges of sexual content creators.
"I chose to fight, be brave, and to deal with it": Threat Experiences and Security Practices of Pakistani Content Creators
Lea Gröber, CISPA Helmholtz Center for Information Security and Saarland University; Waleed Arshad and Shanza, Lahore University of Management Sciences; Angelica Goetzen, Max Planck Institute for Software Systems; Elissa M. Redmiles, Georgetown University; Maryam Mustafa, Lahore University of Management Sciences; Katharina Krombholz, CISPA Helmholtz Center for Information Security
Content creators are exposed to elevated risks compared to the general Internet user. This study explores the threat landscape that creators in Pakistan are exposed to, how they protect themselves, and which support structures they rely on. We conducted a semi-structured interview study with 23 creators from diverse backgrounds who create content on various topics. Our data suggests that online threats frequently spill over into the offline world, especially for gender minorities. Creating content on sensitive topics like politics, religion, and human rights is associated with elevated risks. We find that defensive mechanisms and external support structures are non-existent, lacking, or inadequately adjusted to the sociocultural context of Pakistan.
Disclaimer: This paper contains quotes describing harmful experiences relating to sexual and physical assault, eating disorders, and extreme threats of violence.
Investigating Moderation Challenges to Combating Hate and Harassment: The Case of Mod-Admin Power Dynamics and Feature Misuse on Reddit
Madiha Tabassum, Northeastern University; Alana Mackey, Wellesley College; Ashley Schuett, George Washington University; Ada Lerner, Northeastern University
Social media platforms often rely on volunteer moderators to combat hate and harassment and create safe online environments. In the face of challenges combating hate and harassment, moderators engage in mutual support with one another. We conducted a qualitative content analysis of 115 hate and harassment-related threads from r/ModSupport and r/modhelp, two major subreddit forums for this type of mutual support. We analyze the challenges moderators face; complex tradeoffs related to privacy, utility, and harassment; and major challenges in the relationship between moderators and platform admins. We also present the first systematization of how platform features (including especially security, privacy, and safety features) are misused for online abuse, and drawing on this systematization we articulate design themes for platforms that want to resist such misuse.
"Did They F***ing Consent to That?": Safer Digital Intimacy via Proactive Protection Against Image-Based Sexual Abuse
Lucy Qin, Georgetown University; Vaughn Hamilton, Max Planck Institute for Software Systems; Sharon Wang, University of Washington; Yigit Aydinalp and Marin Scarlett, European Sex Workers Rights Alliance; Elissa M. Redmiles, Georgetown University
As many as 8 in 10 adults share intimate content such as nude or lewd images. Sharing such content has significant benefits for relationship intimacy and body image, and can offer employment. However, stigmatizing attitudes and a lack of technological mitigations put those sharing such content at risk of sexual violence. An estimated 1 in 3 people have been subjected to image-based sexual abuse (IBSA), a spectrum of violence that includes the nonconsensual distribution or threat of distribution of consensually-created intimate content (also called NDII). In this work, we conducted a rigorous empirical interview study of 52 European creators of intimate content to examine the threats they face and how they defend against them, situated in the context of their different use cases for intimate content sharing and their choice of technologies for storing and sharing such content. Synthesizing our results with the limited body of prior work on technological prevention of NDII, we offer concrete next steps for both platforms and security & privacy researchers to work toward safer intimate content sharing through proactive protection.
Content Warning: This work discusses sexual violence, specifically, the harms of image-based sexual abuse (particularly in Sections 2 and 6).
Hardware Security I: Attacks and Defense
AttackGNN: Red-Teaming GNNs in Hardware Security Using Reinforcement Learning
Vasudev Gohil, Texas A&M University; Satwik Patnaik, University of Delaware; Dileep Kalathil and Jeyavijayan Rajendran, Texas A&M University
Machine learning has shown great promise in addressing several critical hardware security problems. In particular, researchers have developed novel graph neural network (GNN)-based techniques for detecting intellectual property (IP) piracy, detecting hardware Trojans (HTs), and reverse engineering circuits, to name a few. These techniques have demonstrated outstanding accuracy and have received much attention in the community. However, since these techniques are used for security applications, it is imperative to evaluate them thoroughly and ensure they are robust and do not compromise the security of integrated circuits.
In this work, we propose AttackGNN, the first red-team attack on GNN-based techniques in hardware security. To this end, we devise a novel reinforcement learning (RL) agent that generates adversarial examples, i.e., circuits, against the GNN-based techniques. We overcome three challenges related to effectiveness, scalability, and generality to devise a potent RL agent. We target five GNN-based techniques for four crucial classes of problems in hardware security: IP piracy, detecting/localizing HTs, reverse engineering, and hardware obfuscation. Through our approach, we craft circuits that fool all GNNs considered in this work. For instance, to evade IP piracy detection, we generate adversarial pirated circuits that fool the GNN-based defense into classifying our crafted circuits as not pirated. For attacking HT localization GNN, our attack generates HT-infested circuits that fool the defense on all tested circuits. We obtain a similar 100% success rate against GNNs for all classes of problems.
INSIGHT: Attacking Industry-Adopted Learning Resilient Logic Locking Techniques Using Explainable Graph Neural Network
Lakshmi Likhitha Mankali, New York University; Ozgur Sinanoglu, New York University Abu Dhabi; Satwik Patnaik, University of Delaware
Logic locking is a hardware-based solution that protects against hardware intellectual property (IP) piracy. With the advent of powerful machine learning (ML)-based attacks, in the last 5 years, researchers have developed several learning resilient locking techniques claiming superior security guarantees. However, these security guarantees are the result of evaluation against existing ML-based attacks having critical limitations, including (i) black-box operation, i.e., does not provide any explanations, (ii) are not practical, i.e., nonconsideration of approaches followed by the semiconductor industry, and (iii) are not broadly applicable, i.e., evaluate the security of a specific logic locking technique.
In this work, we question the security provided by learning resilient locking techniques by developing an attack (INSIGHT) using an explainable graph neural network (GNN). INSIGHT recovers the secret key without requiring scan-access, i.e., in an oracle-less setting for 7 unbroken learning resilient locking techniques, including 2 industry-adopted logic locking techniques. INSIGHT achieves an average key-prediction accuracy (KPA) of2.87×,1.75×,and1.67× higher than existing ML-based attacks. We demonstrate the efficacy of INSIGHT by evaluating locked designs ranging from widely used academic suites (ISCAS-85, ITC-99) to larger designs, such as MIPS, Google IBEX, and mor1kx processors. We perform 2 practical case studies: (i) recovering secret keys of locking techniques used in a widely used commercial EDA tool (Synopsys TestMAX) and (ii) showcasing the ramifications of leaking the secret key for an image processing application. We will open-source our artifacts to foster research on developing learning resilient locking techniques.
Eye of Sauron: Long-Range Hidden Spy Camera Detection and Positioning with Inbuilt Memory EM Radiation
Qibo Zhang and Daibo Liu, Hunan University; Xinyu Zhang, University of California San Diego; Zhichao Cao, Michigan State University; Fanzi Zeng, Hongbo Jiang, and Wenqiang Jin, Hunan University
In this paper, we present ESauron — the first proof-of-concept system that can detect diverse forms of spy cameras (i.e., wireless, wired and offline devices) and quickly pinpoint their locations. The key observation is that, for all spy cameras, the captured raw images must be first digested (e.g., encoding and compression) in the video-capture devices before transferring to target receiver or storage medium. This digestion process takes place in an inbuilt read-write memory whose operations cause electromagnetic radiation (EMR). Specifically, the memory clock drives a variable number of switching voltage regulator activities depending on the workloads, causing fluctuating currents injected into memory units, thus emitting EMR signals at the clock frequency. Whenever the visual scene changes, bursts of video data processing (e.g., video encoding) suddenly aggravate the memory workload, bringing responsive EMR patterns. ESauron can detect spy cameras by intentionally stimulating scene changes and then sensing the surge of EMRs even from a considerable distance. We implemented a proof-of-concept prototype of the ESauron by carefully designing techniques to sense and differentiate memory EMRs, assert the existence of spy cameras, and pinpoint their locations. Experiments with 50 camera products show that ESauron can detect all spy cameras with an accuracy of 100% after only 4 stimuli, the detection range can exceed 20 meters even in the presence of blockages, and all spy cameras can be accurately located.
Improving the Ability of Thermal Radiation Based Hardware Trojan Detection
Ting Su, Yaohua Wang, Shi Xu, Lusi Zhang, Simin Feng, Jialong Song, Yiming Liu, Yongkang Tang, Yang Zhang, Shaoqing Li, Yang Guo, and Hengzhu Liu, National University of Defense Technology
Hardware Trojans (HTs) pose a significant and growing threat to the field of hardware security. Several side-channel techniques, including power and electromagnetic radiation (EMR), have been proposed for HT detection, constrained by reliance on the golden chip or test vectors. In response, researchers advocate for the use of thermal radiation (TR) to identify HTs. However, existing TR-based methods are designed for the ideal HT that can fully occupy at least one pixel on the thermal radiation map (TRM). In reality, HTs may occupy multiple pixels, substantially diminishing occupancy in each pixel, thereby reducing the accuracy of existing detection methods. This challenge is exacerbated by the noise caused by the thermal camera. To this end, this paper introduces a countermeasure named noise based pixel occupation enhancement (NICE), aiming to improve the ability of TR-based HT detection. The key insight of NICE is that noise can vary the pixel occupation of HTs while disrupting HT detection. Consequently, the noise can be exploited to statistically find out the largest pixel occupation among the variations, thereby enhancing HT detection accuracy. Experimental results on a 0.13 μm Digital Signal Processing (DSP) show that the detection rate of NICE exceeds the existing TR-based method by more than 47%, reaching 91.81%, while maintaining a false alarm rate of less than 9%. Both metrics of NICE are comparable to the existing power-based and EMR-based methods, eliminating the need for the golden chip and test vectors.
System Security I: OS
Endokernel: A Thread Safe Monitor for Lightweight Subprocess Isolation
Fangfei Yang, Rice University; Bumjin Im, Amazon.com; Weijie Huang, Rice University; Kelly Kaoudis, Trail of Bits; Anjo Vahldiek-Oberwagner, Intel Labs; Chia-Che Tsai, Texas A&M University; Nathan Dautenhahn, Riverside Research
Compartmentalization decomposes applications into isolated components, effectively confining the scope of potential security breaches. Recent approaches nest the protection monitor within processes for efficient memory isolation at the cost of security. However, these systems lack solutions for efficient multithreaded safety and neglect kernel semantics that can be abused to bypass the monitor.
The Endokernel is an intra-process security monitor that isolates memory at subprocess granularity. It ensures backwards-compatible and secure emulation of system interfaces, a task uniquely challenging due to the need to analyze OS and hardware semantics beyond mere interface usability. We introduce an inside-out methodology where we identify core OS primitives that allow bypass and map that back to the interfaces that depend on them. This approach led to the identification of several missing policies as well as aided in developing a fine-grained locking approach to deal with complex thread safety when inserting a monitor between the OS and the application. Results indicate that we can achieve fast isolation while greatly enhancing security and maintaining backwards-compatibility, and also showing a new method for systematically finding gaps in policies.
HIVE: A Hardware-assisted Isolated Execution Environment for eBPF on AArch64
Peihua Zhang, SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Chenggang Wu, SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Zhongguancun Laboratory; Xiangyu Meng, Northwestern Polytechnical University; Yinqian Zhang, Southern University of Science and Technology; Mingfan Peng, Shiyang Zhang, and Bing Hu, SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Mengyao Xie, SKLP, Institute of Computing Technology, CAS; Yuanming Lai and Yan Kang, SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Zhe Wang, SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Zhongguancun Laboratory
eBPF has become a critical component in Linux. To ensure kernel security, BPF programs are statically verified before being loaded and executed in the kernel. However, the state-of-the-art eBPF verifier has both security and complexity issues. To this end, we choose to look at BPF programs from a new perspective and regard them as a new type of kernel-mode application, thus an isolation-based rather than a verificationbased approach is needed. In this paper, we propose HIVE, an isolation execution environment for BPF programs on AArch64. To provide the equivalent security guarantees, we systematize the security aims of the eBPF verifier and categorize two types of pointers in eBPF: the inclusive type pointer that points to BPF objects and the exclusive type pointer that points to kernel objects. For the former, HIVE compartmentalizes all BPF memory from the kernel and de-privileges the memory accesses in the BPF programs by leveraging the load/store unprivileged instructions; for the latter, HIVE utilizes the pointer authentication feature to enforce access controls of kernel objects. Evaluation results show that HIVE is not only efficient but also supports complex BPF programs.
BUDAlloc: Defeating Use-After-Free Bugs by Decoupling Virtual Address Management from Kernel
Junho Ahn, Jaehyeon Lee, Kanghyuk Lee, Wooseok Gwak, Minseong Hwang, and Youngjin Kwon, KAIST
Use-after-free bugs are an important class of vulnerabilities that often pose serious security threats. To prevent or detect use-after-free bugs, one-time allocators have recently gained attention for their better performance scalability and immediate detection of use-after-free bugs compared to garbage collection approaches. This paper introduces BUDAlloc, a one-time-allocator for detecting and protecting use-after-free bugs in unmodified binaries. The core idea is co-designing a user-level allocator and kernel by separating virtual and physical address management. The user-level allocator manages virtual address layout, eliminating the need for system calls when creating virtual alias, which is essential for reducing internal fragmentation caused by the one-time-allocator. BUDAlloc customizes the kernel page fault handler with eBPF for batching unmap requests when freeing objects. In SPEC CPU 2017, BUDAlloc achieves a 15% performance improvement over DangZero and reduces memory overhead by 61% compared to FFmalloc.
Page-Oriented Programming: Subverting Control-Flow Integrity of Commodity Operating System Kernels with Non-Writable Code Pages
Seunghun Han, The Affiliated Institute of ETRI, Chungnam National University; Seong-Joong Kim, Wook Shin, and Byung Joon Kim, The Affiliated Institute of ETRI; Jae-Cheol Ryou, Chungnam National University
This paper presents a novel attack technique called page-oriented programming, which reuses existing code gadgets by remapping physical pages to the virtual address space of a program at runtime. The page remapping vulnerabilities may lead to data breaches or may damage kernel integrity. Therefore, manufacturers have recently released products equipped with hardware-assisted guest kernel integrity enforcement. This paper extends the notion of the page remapping attack to another type of code-reuse attack, which can not only be used for altering or sniffing kernel data but also for building and executing malicious code at runtime. We demonstrate the effectiveness of this attack on state-of-the-art hardware and software, where control-flow integrity policies are enforced, thus highlighting its capability to render most legacy systems vulnerable.
Network Security I: DDoS
SmartCookie: Blocking Large-Scale SYN Floods with a Split-Proxy Defense on Programmable Data Planes
Sophia Yoo, Xiaoqi Chen, and Jennifer Rexford, Princeton University
Despite decades of mitigation efforts, SYN flooding attacks continue to increase in frequency and scale, and adaptive adversaries continue to evolve. Meanwhile, volumes of benign traffic in modern networks are also growing rampantly. As a result, network providers, which run thousands of servers and process 100s of Gbps of traffic, find themselves urgently requiring defenses that are secure against adaptive adversaries, scalable against large volumes of traffic, and highly performant for benign applications. Unfortunately, existing defenses local to a single device (e.g., purely software-based or hardware-based) are failing to keep up with growing attacks and struggle to provide performance, security, or both. In this paper, we present SmartCookie, the first system to run cryptographically secure SYN cookie checks on high-speed programmable switches, for both security and performance. Our novel split-proxy defense leverages emerging programmable switches to block 100% of SYN floods in the switch data plane and also uses state-of-the-art kernel technologies such as eBPF to enable scalability for serving benign traffic. SmartCookie defends against adaptive adversaries at two orders of magnitude greater attack traffic than traditional CPU-based software defenses, blocking attacks of 136.9 Mpps without packet loss. We also achieve 2x-6.5x lower end-to-end latency for benign traffic compared to existing switch-based hardware defenses.
Loopy Hell(ow): Infinite Traffic Loops at the Application Layer
Yepeng Pan, Anna Ascheman, and Christian Rossow, CISPA Helmholtz Center for Information Security
Denial-of-Service (DoS) attacks have long been a persistent threat to network infrastructures. Existing attack primitives require attackers to continuously send traffic, such as in SYN floods, amplification attacks, or application-layer DoS. In contrast, we study the threat of application-layer traffic loops, which are an almost cost-free attack primitive alternative. Such loops exist, e.g., if two servers consider messages sent to each other as malformed and respond with errors that again trigger error messages. Attackers can send a single IP-spoofed loop trigger packet to initiate an infinite loop among two servers. But despite the severity of traffic loops, to the best of our knowledge, they have never been studied in greater detail.
In this paper, we thus investigate the threat of application-layer traffic loops. To this end, we propose a systematic approach to identify loops among real servers. Our core idea is to learn the response functions of all servers of a given application-layer protocol, encode this knowledge into a loop graph, and finally, traverse the graph to spot looping server pairs. Using the proposed method, we examined traffic loops among servers running both popular (DNS, NTP, and TFTP) and legacy (Daytime, Time, Active Users, Chargen, QOTD, and Echo) UDP protocols and confirmed the prevalence of traffic loops. In total, we identified approximately 296k servers in IPv4 vulnerable to traffic loops, providing attackers the opportunity to abuse billions of loop pairs.
Zero-setup Intermediate-rate Communication Guarantees in a Global Internet
Marc Wyss and Adrian Perrig, ETH Zurich
Network-targeting volumetric DDoS attacks remain a major threat to Internet communication. Unfortunately, existing solutions fall short of providing forwarding guarantees to the important class of short-lived intermediate-rate communication such as web traffic in a secure, scalable, light-weight, low-cost, and incrementally deployable fashion. To overcome those limitations we design Z-Lane, a system achieving those objectives by ensuring bandwidth isolation among authenticated traffic from (groups of) autonomous systems, thus safeguarding intermediate-rate communication against even the largest volumetric DDoS attacks. Our evaluation on a global testbed and our high-speed implementation on commodity hardware demonstrate Z-Lane's effectiveness and scalability.
Towards an Effective Method of ReDoS Detection for Non-backtracking Engines
Weihao Su, Hong Huang, and Rongchen Li, Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Haiming Chen, Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences; Tingjian Ge, Miner School of Computer & Information Sciences, University of Massachusetts, Lowell
Regular expressions (regexes) are a fundamental concept across the fields of computer science. However, they can also induce the Regular expression Denial of Service (ReDoS) attacks, which are a class of denial of service attacks, caused by super-linear worst-case matching time. Due to the severity and prevalence of ReDoS attacks, the detection of ReDoS-vulnerable regexes in software is thus vital. Although various ReDoS detection approaches have been proposed, these methods have focused mainly on backtracking regex engines, leaving the problem of ReDoS vulnerability detection on non-backtracking regex engines largely open.
To address the above challenges, in this paper, we first systematically analyze the major causes that could contribute to ReDoS vulnerabilities on non-backtracking regex engines. We then propose a novel type of ReDoS attack strings that builds on the concept of simple strings. Next we propose EvilStrGen, a tool for generating attack strings for ReDoS-vulnerable regexes on non-backtracking engines. It is based on a novel incremental determinisation algorithm with heuristic strategies to lazily find the k-simple strings without explicit construction of finite automata. We evaluate EvilStrGen against six state-of-the-art approaches on a broad range of publicly available datasets containing 736,535 unique regexes. The results illustrate the significant efficacy of our tool. We also apply our tool to 85 intensively-tested projects, and have identified 34 unrevealed ReDoS vulnerabilities.
ML I: Federated Learning
FAMOS: Robust Privacy-Preserving Authentication on Payment Apps via Federated Multi-Modal Contrastive Learning
Yifeng Cai, Key Laboratory of High Confidence Software Technologies (PKU), Ministry of Education; School of Computer Science, Peking University; Ziqi Zhang, Department of Computer Science, University of Illinois Urbana-Champaign; Jiaping Gui, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; Bingyan Liu, School of Computer Science, Beijing University of Posts and Telecommunications; Xiaoke Zhao, Ruoyu Li, and Zhe Li, Ant Group; Ding Li, Key Laboratory of High Confidence Software Technologies (PKU), Ministry of Education; School of Computer Science, Peking University
The rise of mobile payment apps necessitates robust user authentication to ensure legitimate user access. Traditional methods, like passwords and biometrics, are vulnerable once a device is compromised. To overcome these limitations, modern solutions utilize sensor data to achieve user-agnostic and scalable behavioral authentication. However, existing solutions face two problems when deployed to real-world applications. First, it is not robust to noisy background activities. Second, it faces the risks of privacy leakage as it relies on centralized training with users' sensor data.
In this paper, we introduce FAMOS, a novel authentication framework based on federated multi-modal contrastive learning. The intuition of FAMOS is to fuse multi-modal sensor data and cluster the representation of one user's data by the action category so that we can eliminate the influence of background noise and guarantee the user's privacy. Furthermore, we incorporate FAMOS with federated learning to enhance performance while protecting users' privacy. We comprehensively evaluate FAMOS using real-world datasets and devices. Experimental results show that FAMOS is efficient and accurate for real-world deployment. FAMOS has an F1-Score of 0.91 and an AUC of 0.97, which are 42.19% and 27.63% higher than the baselines, respectively.
Efficient Privacy Auditing in Federated Learning
Hongyan Chang, National University of Singapore; Brandon Edwards, Intel Corporation; Anindya S. Paul, University of Florida; Reza Shokri, National University of Singapore
We design a novel efficient membership inference attack to audit privacy risks in federated learning. Our approach involves computing the slope of specific model performance metrics (e.g., model's output and its loss) across FL rounds to differentiate members from non-members. Since these metrics are automatically computed during the FL process, our solution imposes negligible overhead and can be seamlessly integrated without disrupting training. We validate the effectiveness and superiority of our method over prior work across a wide range of FL settings and real-world datasets.
Defending Against Data Reconstruction Attacks in Federated Learning: An Information Theory Approach
Qi Tan, Department of Computer Science and Technology, Tsinghua University; Qi Li, Institute for Network Science and Cyberspace, Tsinghua University; Yi Zhao, School of Cyberspace Science and Technology, Beijing Institute of Technology; Zhuotao Liu, Institute for Network Science and Cyberspace, Tsinghua University; Xiaobing Guo, Lenovo Research; Ke Xu, Department of Computer Science and Technology, Tsinghua University
Federated Learning (FL) trains a black-box and high-dimensional model among different clients by exchanging parameters instead of direct data sharing, which mitigates the privacy leak incurred by machine learning. However, FL still suffers from membership inference attacks (MIA) or data reconstruction attacks (DRA). In particular, an attacker can extract the information from local datasets by constructing DRA, which cannot be effectively throttled by existing techniques, e.g., Differential Privacy (DP).
In this paper, we aim to ensure a strong privacy guarantee for FL under DRA. We prove that econstruction errors under DRA are constrained by the information acquired by an attacker, which means that constraining the transmitted information can effectively throttle DRA. To quantify the information leakage incurred by FL, we establish a channel model, which depends on the upper bound of joint mutual information between the local dataset and multiple transmitted parameters. Moreover, the channel model indicates that the transmitted information can be constrained through data space operation, which can improve training efficiency and the model accuracy under constrained information. According to the channel model, we propose algorithms to constrain the information transmitted in a single round of local training. With a limited number of training rounds, the algorithms ensure that the total amount of transmitted information is limited. Furthermore, our channel model can be applied to various privacy-enhancing techniques (such as DP) to enhance privacy guarantees against DRA. Extensive experiments with real-world datasets validate the effectiveness of our methods.
Lotto: Secure Participant Selection against Adversarial Servers in Federated Learning
Zhifeng Jiang and Peng Ye, Hong Kong University of Science and Technology; Shiqi He, University of Michigan; Wei Wang, Hong Kong University of Science and Technology; Ruichuan Chen, Nokia Bell Labs; Bo Li, Hong Kong University of Science and Technology
In Federated Learning (FL), common privacy-enhancing techniques, such as secure aggregation and distributed differential privacy, rely on the critical assumption of an honest majority among participants to withstand various attacks. In practice, however, servers are not always trusted, and an adversarial server can strategically select compromised clients to create a dishonest majority, thereby undermining the system's security guarantees. In this paper, we present Lotto, an FL system that addresses this fundamental, yet underexplored issue by providing secure participant selection against an adversarial server. Lotto supports two selection algorithms: random and informed. To ensure random selection without a trusted server, Lotto enables each client to autonomously determine their participation using verifiable randomness. For informed selection, which is more vulnerable to manipulation, Lotto approximates the algorithm by employing random selection within a refined client pool. Our theoretical analysis shows that Lotto effectively aligns the proportion of server-selected compromised participants with the base rate of dishonest clients in the population. Large-scale experiments further reveal that Lotto achieves time-to-accuracy performance comparable to that of insecure selection methods, indicating a low computational overhead for secure selection.
Security Analysis I: Source Code and Binary
Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation
Zion Leonahenahe Basque, Ati Priya Bajaj, Wil Gibbs, Jude O'Kain, Derron Miao, Tiffany Bao, Adam Doupé, Yan Shoshitaishvili, and Ruoyu Wang, Arizona State University
Contrary to prevailing wisdom, we argue that the measure of binary decompiler success is not to eliminate all gotos or reduce the complexity of the decompiled code but to get as close as possible to the original source code. Many gotos exist in the original source code (the Linux kernel version 6.1 contains 3,754) and, therefore, should be preserved during decompilation, and only spurious gotos should be removed.
Fundamentally, decompilers insert spurious gotos in decompilation because structuring algorithms fail to recover C-style structures from binary code. Through a quantitative study, we find that the root cause of spurious gotos is compiler-induced optimizations that occur at all optimization levels (17% in non-optimized compilation). Therefore, we believe that to achieve high-quality decompilation, decompilers must be compiler-aware to mirror (and remove) the goto-inducing optimizations.
In this paper, we present a novel structuring algorithm called SAILR that mirrors the compilation pipeline of GCC and precisely inverts goto-inducing transformations. We build an open-source decompiler on angr (the angr decompiler) and implement SAILR as well as otherwise-unavailable prior work (Phoenix, DREAM, and rev.ng's Combing) and evaluate them, using a new metric of how close the decompiled code structure is to the original source code, showing that SAILR markedly improves on prior work. In addition, we find that SAILR performs well on binaries compiled with non-GCC compilers, which suggests that compilers similarly implement goto-inducing transformations.
A Taxonomy of C Decompiler Fidelity Issues
Luke Dramko and Jeremy Lacomis, Carnegie Mellon University; Edward J. Schwartz, Carnegie Mellon University Software Engineering Institute; Bogdan Vasilescu and Claire Le Goues, Carnegie Mellon University
Decompilation is an important part of analyzing threats in computer security. Unfortunately, decompiled code contains less information than the corresponding original source code, which makes understanding it more difficult for the reverse engineers who manually perform threat analysis. Thus, the fidelity of decompiled code to the original source code matters, as it can influence reverse engineers' productivity. There is some existing work in predicting some of the missing information using statistical methods, but these focus largely on variable names and variable types. In this work, we more holistically evaluate decompiler output from C-language executables and use our findings to inform directions for future decompiler development. More specifically, we use open-coding techniques to identify defects in decompiled code beyond missing names and types. To ensure that our study is robust, we compare and evaluate four different decompilers. Using thematic analysis, we build a taxonomy of decompiler defects. Using this taxonomy to reason about classes of issues, we suggest specific approaches that can be used to mitigate fidelity issues in decompiled code.
D-Helix: A Generic Decompiler Testing Framework Using Symbolic Differentiation
Muqi Zou, Arslan Khan, Ruoyu Wu, Han Gao, Antonio Bianchi, and Dave (Jing) Tian, Purdue University
Decompilers, one of the widely used security tools, transform low-level binary programs back into their high-level source representations, such as C/C++. While state-of-the-art decompilers try to generate more human-readable outputs, for instance, by eliminating goto statements in their decompiled code, the correctness of a decompilation process is largely ignored due to the complexity of decompilers, e.g., involving hundreds of heuristic rules. As a result, outputs from decompilers are often not accurate, which affects the effectiveness of downstream security tasks.
In this paper, we propose D-HELIX, a generic decompiler testing framework that can automatically vet the decompilation correctness on the function level. D-HELIX uses RECOMPILER to compile the decompiled code at the functional level. It then uses SYMDIFF to compare the symbolic model of the original binary with the one of the decompiled code, detecting potential errors introduced by the decompilation process. D-HELIX further provides TUNER to help debug the incorrect decompilation via toggling decompilation heuristic rules automatically. We evaluated D-HELIX on Ghidra and angr using 2,004 binaries and object files ending up with 93K decompiled functions in total. D-HELIX detected 4,515 incorrectly decompiled functions, reproduced 8 known bugs, found 17 distinct previously unknown bugs within these two decompilers, and fixed 7 bugs automatically.
SymFit: Making the Common (Concrete) Case Fast for Binary-Code Concolic Execution
Zhenxiao Qi, Jie Hu, Zhaoqi Xiao, and Heng Yin, UC Riverside
Concolic execution is a powerful technique in software testing, as it can systematically explore the code paths and is capable of traversing complex branches. It combines concrete execution for environment modeling and symbolic execution for path exploration. While significant research efforts in concolic execution have been directed toward the improvement of symbolic execution and constraint solving, our study pivots toward the often overlooked yet most common aspect: concrete execution. Our analysis shows that state-of-the-art binary concolic executors have largely overlooked the overhead in the execution of concrete instructions. In light of this observation, we propose optimizations to make the common (concrete) case fast. To validate this idea, we develop the prototype, SymFit, and evaluate it on standard benchmarks and real-world applications. The results showed that the performance of pure concrete execution is much faster than the baseline SymQEMU, and is comparable to the vanilla QEMU. Moreover, we showed that the fast symbolic tracing capability of SymFit can significantly improve the efficiency of crash deduplication.
Crypto I: Secret Key Exchange
K-Waay: Fast and Deniable Post-Quantum X3DH without Ring Signatures
Daniel Collins and Loïs Huguenin-Dumittan, EPFL; Ngoc Khanh Nguyen, King’s College London; Nicolas Rolin, Spuerkeess; Serge Vaudenay, EPFL
The Signal protocol and its X3DH key exchange core are regularly used by billions of people in applications like WhatsApp but are unfortunately not quantum-secure. Thus, designing an efficient and post-quantum secure X3DH alternative is paramount. Notably, X3DH supports asynchronicity, as parties can immediately derive keys after uploading them to a central server, and deniability, allowing parties to plausibly deny having completed key exchange. To satisfy these constraints, existing post-quantum X3DH proposals use ring signatures (or equivalently a form of designated-verifier signatures) to provide authentication without compromising deniability as regular signatures would. Existing ring signature schemes, however, have some drawbacks. Notably, they are not generally proven secure in the quantum random oracle model (QROM) and so the quantum security of parameters that are proposed is unclear and likely weaker than claimed. In addition, they are generally slower than standard primitives like KEMs.
In this work, we propose an efficient, deniable and post-quantum X3DH-like protocol that we call K-Waay, that does not rely on ring signatures. At its core, K-Waay uses a split-KEM, a primitive introduced by Brendel et al. [SAC 2020], to provide Diffie-Hellman-like implicit authentication and secrecy guarantees. Along the way, we revisit the formalism of Brendel et al. and identify that additional security properties are required to prove a split-KEM-based protocol secure. We instantiate split-KEM by building a protocol based on the Frodo key exchange protocol relying on the plain LWE assumption: our proofs might be of independent interest as we show it satisfies our novel unforgeability and deniability security notions. Finally, we complement our theoretical results by thoroughly benchmarking both K-Waay and existing X3DH protocols. Our results show even when using plain LWE and a conservative choice of parameters that K-Waay is significantly faster than previous work.
Diffie-Hellman Picture Show: Key Exchange Stories from Commercial VoWiFi Deployments
Gabriel K. Gegenhuber and Florian Holzbauer, University of Vienna; Philipp É. Frenzel, SBA Research; Edgar Weippl, University of Vienna and Christian Doppler Laboratory for Security and Quality Improvement in the Production System Lifecycle (CDL-SQI); Adrian Dabrowski, CISPA Helmholtz Center for Information Security
Voice over Wi-Fi (VoWiFi) uses a series of IPsec tunnels to deliver IP-based telephony from the subscriber's phone (User Equipment, UE) into the Mobile Network Operator's (MNO) core network via an Internet-facing endpoint, the Evolved Packet Data Gateway (ePDG). IPsec tunnels are set up in phases. The first phase negotiates the cryptographic algorithm and parameters and performs a key exchange via the Internet Key Exchange protocol, while the second phase (protected by the above-established encryption) performs the authentication. An insecure key exchange would jeopardize the later stages and the data's security and confidentiality.
In this paper, we analyze the phase 1 settings and implementations as they are found in phones as well as in commercially deployed networks worldwide. On the UE side, we identified a recent 5G baseband chipset from a major manufacturer that allows for fallback to weak, unannounced modes and verified it experimentally. On the MNO side –among others– we identified 13 operators (totaling an estimated 140 million subscribers) on three continents that all use the same globally static set of ten private keys, serving them at random. Those not-so-private keys allow the decryption of the shared keys of every VoWiFi user of all those operators. All these operators deployed their core network from one common manufacturer.
Formal verification of the PQXDH Post-Quantum key agreement protocol for end-to-end secure messaging
Karthikeyan Bhargavan, Cryspen; Charlie Jacomme, Inria Nancy Grand-Est, Université de Lorraine, LORIA, France; Franziskus Kiefer, Cryspen; Rolfe Schmidt, Signal Messenger
The Signal Messenger recently introduced a new asynchronous key agreement protocol called PQXDH (PostQuantum Extended Diffie-Hellman) that seeks to provide post-quantum forward secrecy, in addition to the authentication and confidentiality guarantees already provided by the previous X3DH (Extended Diffie-Hellman) protocol. More precisely, PQXDH seeks to protect the confidentiality of messages against harvest-now-decrypt-later attacks.
In this work, we formally specify the PQXDH protocol and analyze its security using two formal verification tools, PROVERIF and CRYPTOVERIF. In particular, we ask whether PQXDH preserves the guarantees of X3DH, whether it provides post-quantum forward secrecy, and whether it can be securely deployed alongside X3DH. Our analysis identifies several flaws and potential vulnerabilities in the PQXDH specification, although these vulnerabilities are not exploitable in the Signal application, thanks to specific implementation choices which we describe in this paper. To prove the security of the current implementation, our analysis notably highlighted the need for an additional binding property of the KEM, which we formally define and prove for Kyber.
We collaborated with the protocol designers to develop an updated protocol specification based on our findings, where each change was formally verified and validated with a security proof. This work identifies some pitfalls that the community should be aware of when upgrading protocols to be post-quantum secure. It also demonstrates the utility of using formal verification hand-in-hand with protocol design.
SWOOSH: Efficient Lattice-Based Non-Interactive Key Exchange
Phillip Gajland, Max Planck Institute for Security and Privacy, Ruhr University Bochum; Bor de Kock, NTNU - Norwegian University of Science and Technology, Trondheim, Norway; Miguel Quaresma, Max Planck Institute for Security and Privacy; Giulio Malavolta, Bocconi University, Max Planck Institute for Security and Privacy; Peter Schwabe, Max Planck Institute for Security and Privacy, Radboud University
The advent of quantum computers has sparked significant interest in post-quantum cryptographic schemes, as a replacement for currently used cryptographic primitives. In this context, lattice-based cryptography has emerged as the leading paradigm to build post-quantum cryptography. However, all existing viable replacements of the classical Diffie-Hellman key exchange require additional rounds of interactions, thus failing to achieve all the benefits of this protocol. Although earlier work has shown that lattice-based Non-Interactive Key Exchange (NIKE) is theoretically possible, it has been considered too inefficient for real-life applications. In this work, we challenge this folklore belief and provide the first evidence against it. We construct an efficient lattice-based NIKE whose security is based on the standard module learning with errors (M-LWE) problem in the quantum random oracle model. Our scheme is obtained in two steps: (i) A passively-secure construction that achieves a strong notion of correctness, coupled with (ii) a generic compiler that turns any such scheme into an actively-secure one. To substantiate our efficiency claim, we provide an optimised implementation of our passively-secure construction in Rust and Jasmin. Our implementation demonstrates the scheme's applicability to real-world scenarios, yielding public keys of approximately 220 KBs. Moreover, the computation of shared keys takes fewer than 12 million cycles on an Intel Skylake CPU, offering a post-quantum security level exceeding 120 bits.
12:15 pm–1:45 pm
1:45 pm–2:45 pm
Social Issues I: Phishing and Password
PhishDecloaker: Detecting CAPTCHA-cloaked Phishing Websites via Hybrid Vision-based Interactive Models
Xiwen Teoh, Shanghai Jiao Tong University; National University of Singapore; Yun Lin, Shanghai Jiao Tong University; Ruofan Liu, Zhiyong Huang, and Jin Song Dong, National University of Singapore
Phishing is a cybersecurity attack based on social engineering that incurs significant financial losses and erodes societal trust. While phishing detection techniques are emerging, attackers continually strive to bypass state-of-the-arts. Recent phishing campaigns have shown that emerging phishing attacks adopt CAPTCHA-based cloaking techniques, marking a new round of cat-and-mouse game. Our study shows that phishing websites, hardened by CAPTCHA-cloaking, can compromise all known state-of-the-art industrial and academic detectors with almost zero cost.
In this work, we develop PhishDecloaker, an AI-powered solution to soften the shield of the CAPTCHA-cloaking used by phishing websites. PhishDecloaker is designed to mimic human behaviors to solve the CAPTCHAs, allowing modern security-crawlers to see the uncloaked phishing content. Technically, PhishDecloaker orchestrates five deep computer vision models to detect the existence of CAPTCHAs, analyze its type, and solve the challenge in an interactive manner. We conduct extensive experiments to evaluate PhishDecloaker in terms of its effectiveness, efficiency, and robustness against potential adversaries. The results show that PhishDecloaker (1) recovers the phishing detection rate of many state-of-theart phishing detectors from 0% to up to on average 74.25% on diverse CAPTCHA-cloaked phishing websites (2) generalizes to unseen CAPTCHA (with precision of 86% and recall of 69%), and (3) is robust against various adversaries such as FGSM, JSMA, PGD, DeepFool, and DPatch, which allows the existing phishing detectors to achieve new state-of-the-art performance on CAPTCHA-cloaked phishing webpages. Our field study over 30 days shows that PhishDecloaker can help us uniquely discover 7.6% more phishing websites cloaked by CAPTCHAs, raising alarm of the emergence of CAPTCHA-cloaked features in the modern phishing campaigns.
Less Defined Knowledge and More True Alarms: Reference-based Phishing Detection without a Pre-defined Reference List
Ruofan Liu, Shanghai Jiao Tong University/National University of Singapore; Yun Lin, Shanghai Jiao Tong University; Xiwen Teoh, National University of Singapore; Gongshen Liu, Shanghai Jiao Tong University; Zhiyong Huang and Jin Song Dong, National University of Singapore
Phishing, a pervasive form of social engineering attack that compromises user credentials, has led to significant financial losses and undermined public trust. Modern phishing detection has gravitated to reference-based methods for their explainability and robustness against zero-day phishing attacks. These methods maintain and update predefined reference lists to specify domain-brand relationships, alarming phishing websites by the inconsistencies between their domain (e.g., payp0l.com) and intended brand (e.g., PayPal). However, the curated lists are largely limited by their lack of comprehensiveness and high maintenance costs in practice.
In this work, we present PhishLLM as a novel reference-based phishing detector that operates without an explicit pre-defined reference list. Our rationale lies in that modern LLMs have encoded far more extensive brand-domain information than any predefined list. Further, the detection of many webpage semantics such as credential-taking intention analysis is more like a linguistic problem, but they are processed as a vision problem now. Thus, we design PhishLLM to decode (or retrieve) the domain-brand relationships from LLM and effectively parse the credential-taking intention of a webpage, without the cost of maintaining and updating an explicit reference list. Moreover, to control the hallucination of LLMs, we introduce a search-engine-based validation mechanism to remove the misinformation. Our extensive experiments show that PhishLLM significantly outperforms state-of-the-art solutions such as Phishpedia and PhishIntention, improving the recall by 21% to 66%, at the cost of negligible precision. Our field studies show that PhishLLM discovers (1) 6 times more zero-day phishing webpages compared to existing approaches such as PhishIntention and (2) close to 2 times more zero-day phishing webpages even if it is enhanced by DynaPhish. Our code is available at https://github.com/code-philia/PhishLLM/.
In Wallet We Trust: Bypassing the Digital Wallets Payment Security for Free Shopping
Raja Hasnain Anwar, University of Massachusetts Amherst; Syed Rafiul Hussain, Pennsylvania State University; Muhammad Taqi Raza, University of Massachusetts Amherst
Digital wallets are a new form of payment technology that provides a secure and convenient way of making contactless payments through smart devices. In this paper, we study the security of financial transactions made through digital wallets, focusing on the authentication, authorization, and access control security functions. We find that the digital payment ecosystem supports the decentralized authority delegation which is susceptible to a number of attacks. First, an attacker adds the victim's bank card into their (attacker's) wallet by exploiting the authentication method agreement procedure between the wallet and the bank. Second, they exploit the unconditional trust between the wallet and the bank, and bypass the payment authorization. Third, they create a trap door through different payment types and violate the access control policy for the payments. The implications of these attacks are of a serious nature where the attacker can make purchases of arbitrary amounts by using the victim's bank card, despite these cards being locked and reported to the bank as stolen by the victim. We validate these findings in practice over major US banks (notably Chase, AMEX, Bank of America, and others) and three digital wallet apps (ApplePay, GPay, and PayPal). We have disclosed our findings to all the concerned parties. Finally, we propose remedies for fixing the design flaws to avoid these and other similar attacks.
The Impact of Exposed Passwords on Honeyword Efficacy
Zonghao Huang, Duke University; Lujo Bauer, Carnegie Mellon University; Michael K. Reiter, Duke University
Honeywords are decoy passwords that can be added to a credential database; if a login attempt uses a honeyword, this indicates that the site's credential database has been leaked. In this paper we explore the basic requirements for honeywords to be effective, in a threat model where the attacker knows passwords for the same users at other sites. First, we show that for user-chosen (vs. algorithmically generated, i.e., by a password manager) passwords, existing honeyword-generation algorithms do not simultaneously achieve false-positive and false-negative rates near their ideals of ≈0 and ≈ 1/1+n, respectively, in this threat model, where n is the number of honeywords per account. Second, we show that for users leveraging algorithmically generated passwords, state-of-the-art methods for honeyword generation will produce honeywords that are not sufficiently deceptive, yielding many false negatives. Instead, we find that only a honeyword-generation algorithm that uses the same password generator as the user can provide deceptive honeywords in this case. However, when the defender's ability to infer the generator from the (one) account password is less accurate than the attacker's ability to infer the generator from potentially many, this deception can again wane. Taken together, our results provide a cautionary note for the state of honeyword research and pose new challenges to the field.
Side Channel I: Transient Execution
InSpectre Gadget: Inspecting the Residual Attack Surface of Cross-privilege Spectre v2
Sander Wiebing, Alvise de Faveri Tron, Herbert Bos, and Cristiano Giuffrida, Vrije Universiteit Amsterdam
Distinguished Paper Award Winner
Spectre v2 is one of the most severe transient execution vulnerabilities, as it allows an unprivileged attacker to lure a privileged (e.g., kernel) victim into speculatively jumping to a chosen gadget, which then leaks data back to the attacker. Spectre v2 is hard to eradicate. Even on last-generation Intel CPUs, security hinges on the unavailability of exploitable gadgets. Nonetheless, with (i) deployed mitigations—eIBRS, no-eBPF, (Fine)IBT—all aimed at hindering many usable gadgets, (ii) existing exploits relying on now-privileged features (eBPF), and (iii) recent Linux kernel gadget analysis studies reporting no exploitable gadgets, the common belief is that there is no residual attack surface of practical concern.
In this paper, we challenge this belief and uncover a significant residual attack surface for cross-privilege Spectre-v2 attacks. To this end, we present InSpectre Gadget, a new gadget analysis tool for in-depth inspection of Spectre gadgets. Unlike existing tools, ours performs generic constraint analysis and models knowledge of advanced exploitation techniques to accurately reason over gadget exploitability in an automated fashion. We show that our tool can not only uncover new (unconventionally) exploitable gadgets in the Linux kernel, but that those gadgets are sufficient to bypass all deployed Intel mitigations. As a demonstration, we present the first native Spectre-v2 exploit against the Linux kernel on last-generation Intel CPUs, based on the recent BHI variant and able to leak arbitrary kernel memory at 3.5 kB/sec. We also present a number of gadgets and exploitation techniques to bypass the recent FineIBT mitigation, along with a case study on a 13th Gen Intel CPU that can leak kernel memory at 18 bytes/sec.
Shesha: Multi-head Microarchitectural Leakage Discovery in new-generation Intel Processors
Anirban Chakraborty, Nimish Mishra, and Debdeep Mukhopadhyay, Indian Institute of Technology Kharagpur
Transient execution attacks have been one of the widely explored microarchitectural side channels since the discovery of Spectre and Meltdown. However, much of the research has been driven by manual discovery of new transient paths through well-known speculative events. Although a few attempts exist in literature on automating transient leakage discovery, such tools focus on finding variants of known transient attacks and explore a small subset of instruction set. Further, they take a random fuzzing approach that does not scale as the complexity of search space increases. In this work, we identify that the search space of bad speculation is disjointedly fragmented into equivalence classes and then use this observation to develop a framework named Shesha, inspired by Particle Swarm Optimization, which exhibits faster convergence rates than state-of-the-art fuzzing techniques for automatic discovery of transient execution attacks. We then use Shesha to explore the vast search space of extensions to the x86 Instruction Set Architecture (ISAs), thereby focusing on previously unexplored avenues of bad speculation. As such, we report five previously unreported transient execution paths in Instruction Set Extensions (ISEs) on new generation of Intel processors. We then perform extensive reverse engineering of each of the transient execution paths and provide root-cause analysis. Using the discovered transient execution paths, we develop attack building blocks to exhibit exploitable transient windows. Finally, we demonstrate data leakage from Fused Multiply-Add instructions through SIMD buffer and extract victim data from various cryptographic implementations.
BeeBox: Hardening BPF against Transient Execution Attacks
Di Jin, Alexander J. Gaidis, and Vasileios P. Kemerlis, Brown University
The Berkeley Packet Filter (BPF) has emerged as the de-facto standard for carrying out safe and performant, user-specified computation(s) in kernel space. However, BPF also increases the attack surface of the OS kernel disproportionately, especially under the presence of transient execution vulnerabilities. In this work, we present BeeBox: a new security architecture that hardens BPF against transient execution attacks, allowing the OS kernel to expose eBPF functionality to unprivileged users and applications. At a high level, BeeBox sandboxes the BPF runtime against speculative code execution in an SFI-like manner. Moreover, by using a combination of static analyses and domain-specific properties, BeeBox selectively elides enforcement checks, improving performance without sacrificing security. We implemented a prototype of BeeBox for the Linux kernel that supports popular features of eBPF (e.g., BPF maps and helper functions), and evaluated it both in terms of effectiveness and performance, demonstrating resilience against prevalent transient execution attacks (i.e., Spectre-PHT and Spectre-STL) with low overhead. On average, BeeBox incurs 20% overhead in the Katran benchmark, while the current mitigations of Linux incur 112% overhead. Lastly, BeeBox exhibits less than 1% throughput degradation in end-to-end, real-world settings that include seccomp-BPF
and packet filtering.
SpecLFB: Eliminating Cache Side Channels in Speculative Executions
Xiaoyu Cheng, School of Cyber Science and Engineering, Southeast University, Nanjing, Jiangsu, China; Jiangsu Province Engineering Research Center of Security for Ubiquitous Network, China; Fei Tong, School of Cyber Science and Engineering, Southeast University, Nanjing, Jiangsu, China; Jiangsu Province Engineering Research Center of Security for Ubiquitous Network, China; Purple Mountain Laboratories, Nanjing, Jiangsu, China; Hongyu Wang, State Key Laboratory of Power Equipment Technology, School of Electrical Engineering, Chongqing University, China; Wiscom System Co., LTD, Nanjing, China; Zhe Zhou and Fang Jiang, School of Cyber Science and Engineering, Southeast University, Nanjing, Jiangsu, China; Jiangsu Province Engineering Research Center of Security for Ubiquitous Network, China; Yuxing Mao, State Key Laboratory of Power Equipment Technology, School of Electrical Engineering, Chongqing University, China
Cache side-channel attacks based on speculative executions are powerful and difficult to mitigate. Existing hardware defense schemes often require additional hardware data structures, data movement operations and/or complex logical computations, resulting in excessive overhead of both processor performance and hardware resources. To this end, this paper proposes SpecLFB, which utilizes the microarchitecture component, Line-Fill-Buffer, integrated with a proposed mechanism for load security check to prevent the establishment of cache side channels in speculative executions. To ensure the correctness and immediacy of load security check, a structure called ROB unsafe mask is designed for SpecLFB to track instruction state. To further reduce processor performance overhead, SpecLFB narrows down the protection scope of unsafe speculative loads and determines the time at which they can be deprotected as early as possible. SpecLFB has been implemented in the open-source RISC-V core, SonicBOOM, as well as in Gem5. For the enhanced SonicBOOM, its register-transfer-level (RTL) code is generated, and an FPGA hardware prototype burned with the core and running a Linux-kernel-based operating system is developed. Based on the evaluations in terms of security guarantee, performance overhead, and hardware resource overhead through RTL simulation, FPGA prototype experiment, and Gem5 simulation, it shows that SpecLFB effectively defends against attacks. It leads to a hardware resource overhead of only 0.6% and the performance overhead of only 1.85% and 3.20% in the FPGA prototype experiment and Gem5 simulation, respectively.
Mobile Security I
Towards Privacy-Preserving Social-Media SDKs on Android
Haoran Lu, Yichen Liu, Xiaojing Liao, and Luyi Xing, Indiana University Bloomington
Integration of third-party SDKs are essential in the development of mobile apps. However, the rise of in-app privacy threat against mobile SDKs— called cross-library data harvesting (XLDH), targets social media/platform SDKs (called social SDKs) that handles rich user data. Given the widespread integration of social SDKs in mobile apps, XLDH presents a significant privacy risk, as well as raising pressing concerns regarding legal compliance for app developers, social media/platform stakeholders, and policymakers. The emerging XLDH threat, coupled with the increasing demand for privacy and compliance in line with societal expectations, introduces unique challenges that cannot be addressed by existing protection methods against privacy threats or malicious code on mobile platforms. In response to the XLDH threats, in our study, we generalize and define the concept of privacy-preserving social SDKs and their in-app usage, characterize fundamental challenges for combating the XLDH threat and ensuring privacy in design and utilizaiton of social SDKs. We introduce a practical, clean-slate design and end-to-end systems, called PESP, to facilitate privacy-preserving social SDKs. Our thorough evaluation demonstrates its satisfactory effectiveness, performance overhead and practicability for widespread adoption.
UIHash: Detecting Similar Android UIs through Grid-Based Visual Appearance Representation
Jiawei Li, Beihang University; National University of Singapore; Jian Mao, Beihang University; Tianmushan Laboratory; Hangzhou Innovation Institute, Beihang University; Jun Zeng, National University of Singapore; Qixiao Lin and Shaowen Feng, Beihang University; Zhenkai Liang, National University of Singapore
User interfaces (UIs) is the main channel for users to interact with mobile apps. As such, attackers often create similar-looking UIs to deceive users, causing various security problems, such as spoofing and phishing. Prior studies identify these similar UIs based on their layout trees or screenshot images. These techniques, however, are susceptible to being evaded. Guided by how users perceive UIs and the features they prioritize, we design a novel grid-based UI representation to capture UI visual appearance while maintaining robustness against evasion. We develop an approach, UIHash, to detect similar Android UIs by comparing their visual appearance. It divides the UI into a #-shaped grid and abstracts UI controls across screen regions, then calculates UI similarity through a neural network architecture that includes a convolutional neural network and a Siamese network. Our evaluation shows that UIHash achieves an F1-score of 0.984 in detection, outperforming existing tree-based methods and image-based methods. Moreover, we have discovered evasion techniques that circumvent existing detection approaches.
Racing for TLS Certificate Validation: A Hijacker's Guide to the Android TLS Galaxy
Sajjad Pourali and Xiufen Yu, Concordia University; Lianying Zhao, Carleton University; Mohammad Mannan and Amr Youssef, Concordia University
Besides developers' code, current Android apps usually integrate code from third-party libraries, all of which may include code for TLS validation. We analyze well-known improper TLS certificate validation issues in popular Android apps, and attribute the validation issues to the offending code/party in a fine-grained manner, unlike existing work labelling an entire app for validation failures. Surprisingly, we discovered a widely used practice of overriding the global default validation functions with improper validation logic, or simply performing no validation at all, affecting the entire app's TLS connections, which we call validation hijacking. We design and implement an automated dynamic analysis tool called Marvin to identify TLS validation failures, including validation hijacking, and the responsible parties behind such dangerous practice. We use Marvin to analyze 6315 apps from a Chinese app store and Google Play, and find many occurrences of insecure TLS certificate validation instances (55.7% of the Chinese apps and 4.6% of the Google Play apps). Validation hijacking happens in 34.3% of the insecure apps from the Chinese app store and 20.0% of insecure Google Play apps. A network attacker can exploit these insecure connections in various ways, e.g., to compromise PII, app login and SSO credentials, to launch phishing and other content modification attacks, including code injection. We found that most of these vulnerabilities are related to third-party libraries used by the apps, not the app code created by app developers. The technical root cause enabling validation hijacking appears to be the specific modifications made by Google in the OkHttp library integrated with the Android OS, which is used by many developers by default, without being aware of its potential dangers. Overall, our findings provide valuable insights into the responsible parties for TLS validation issues in Android, including the validation hijacking problem.
DVa: Extracting Victims and Abuse Vectors from Android Accessibility Malware
Haichuan Xu, Mingxuan Yao, and Runze Zhang, Georgia Institute of Technology; Mohamed Moustafa Dawoud, German International University; Jeman Park, Kyung Hee University; Brendan Saltaformaggio, Georgia Institute of Technology
The Android accessibility (a11y) service is widely abused by malware to conduct on-device monetization fraud. Existing mitigation techniques focus on malware detection but overlook providing users evidence of abuses that have already occurred and notifying victims to facilitate defenses. We developed DVa, a malware analysis pipeline based on dynamic victim-guided execution and abuse-vector-guided symbolic analysis, to help investigators uncover a11y malware's targeted victims, victim-specific abuse vectors, and persistence mechanisms. We deployed DVa to investigate Android devices infected with 9,850 a11y malware. From the extractions, DVa uncovered 215 unique victims targeted with an average of 13.9 abuse routines. DVa also extracted six persistence mechanisms empowered by the a11y service.
Web Security I
SoK: State of the Krawlers – Evaluating the Effectiveness of Crawling Algorithms for Web Security Measurements
Aleksei Stafeev and Giancarlo Pellegrino, CISPA Helmholtz Center for Information Security
Web crawlers are tools widely used in web security measurements whose performance and impact have been limitedly studied so far. In this paper, we bridge this gap. Starting from the past 12 years of the top security, web measurement, and software engineering literature, we categorize and decompose in building blocks crawling techniques and methodologic choices. We then reimplement and patch crawling techniques and integrate them into Arachnarium, a framework for comparative evaluations, which we use to run one of the most comprehensive experimental evaluations against nine real and two benchmark web applications and top 10K CrUX websites to assess the performance and adequacy of algorithms across three metrics (code, link, and JavaScript source coverage). Finally, we distill 14 insights and lessons learned. Our results show that despite a lack of clear and homogeneous descriptions hindering reimplementations, proposed and commonly used crawling algorithms offer a lower coverage than randomized ones, indicating room for improvement. Also, our results show a complex relationship between experiment parameters, the study's domain, and the available computing resources, where no single best-performing crawler configuration exists. We hope our results will guide future researchers when setting up their studies.
Vulnerability-oriented Testing for RESTful APIs
Wenlong Du and Jian Li, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; Yanhao Wang, Independent Researcher; Libo Chen, Ruijie Zhao, and Junmin Zhu, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; Zhengguang Han, QI-ANXIN Technology Group; Yijun Wang and Zhi Xue, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
With the increasing popularity of APIs, ensuring their security has become a crucial concern. However, existing security testing methods for RESTful APIs usually lack targeted approaches to identify and detect security vulnerabilities. In this paper, we propose VOAPI2, a vulnerability-oriented API inspection framework designed to directly expose vulnerabilities in RESTful APIs, based on our observation that the type of vulnerability hidden in an API interface is strongly associated with its functionality. By leveraging this insight, we first track commonly used strings as keywords to identify APIs' functionality. Then, we generate a stateful and suitable request sequence to inspect the candidate API function within a targeted payload. Finally, we verify whether vulnerabilities exist or not through feedback-based testing. Our experiments on real-world APIs demonstrate the effectiveness of our approach, with significant improvements in vulnerability detection compared to state-of-the-art methods. VOAPI2 discovered 7 zero-day and 19 disclosed bugs on seven real-world RESTful APIs, and 23 of them have been assigned CVE IDs. Our findings highlight the importance of considering APIs' functionality when discovering their bugs, and our method provides a practical and efficient solution for securing RESTful APIs.
Web Platform Threats: Automated Detection of Web Security Issues With WPT
Pedro Bernardo and Lorenzo Veronese, TU Wien; Valentino Dalla Valle and Stefano Calzavara, Università Ca' Foscari Venezia; Marco Squarcina, TU Wien; Pedro Adão, Instituto Superior Técnico, Universidade de Lisboa, and Instituto de Telecomunicações; Matteo Maffei, TU Wien
Client-side security mechanisms implemented by Web browsers, such as cookie security attributes and the Mixed Content policy, are of paramount importance to protect Web applications. Unfortunately, the design and implementation of such mechanisms are complicated and error-prone, potentially exposing Web applications to security vulnerabilities. In this paper, we present a practical framework to formally and automatically detect security flaws in client-side security mechanisms. In particular, we leverage Web Platform Tests (WPT), a popular cross-browser test suite, to automatically collect browser execution traces and match them against Web invariants, i.e., intended security properties of Web mechanisms expressed in first-order logic. We demonstrate the effectiveness of our approach by validating 9 invariants against the WPT test suite, discovering violations with clear security implications in 104 tests for Firefox, Chromium and Safari. We disclosed the root causes of these violations to browser vendors and standard bodies, which resulted in 8 individual reports and one CVE on Safari.
Rise of Inspectron: Automated Black-box Auditing of Cross-platform Electron Apps
Mir Masood Ali, Mohammad Ghasemisharif, Chris Kanich, and Jason Polakis, University of Illinois Chicago
Browser-based cross-platform applications have become increasingly popular as they allow software vendors to sidestep two major issues in the app ecosystem. First, web apps can be impacted by the performance deterioration affecting browsers, as the continuous adoption of diverse and complex features has led to bloating. Second, re-developing or porting apps to different operating systems and execution environments is a costly, error-prone process. Instead, frameworks like Electron allow the creation of standalone apps for different platforms using JavaScript code (e.g., reused from an existing web app) and by incorporating a stripped down and configurable browser engine. Despite the aforementioned advantages, these apps face significant security and privacy threats that are either non-applicable to traditional web apps (due to the lack of access to certain system-facing APIs) or ineffective against them (due to countermeasures already baked into browsers). In this paper we present Inspectron, an automated dynamic analysis framework that audits packaged Electron apps for potential security vulnerabilities stemming from developers' deviation from recommended security practices. Our study reveals a multitude of insecure practices and problematic trends in the Electron app ecosystem, highlighting the gap filled by Inspectron as it provides extensive and comprehensive auditing capabilities for developers and researchers.
LLM for Security
KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection
Yuexin Li, Chengyu Huang, and Shumin Deng, National University of Singapore; Mei Lin Lock, NCS Cyber Special Ops-R&D; Tri Cao, National University of Singapore; Nay Oo and Hoon Wei Lim, NCS Cyber Special Ops-R&D; Bryan Hooi, National University of Singapore
Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that they rely on a manually constructed brand knowledge base, making it infeasible to scale to a large number of brands, which results in false negative errors due to the insufficient brand coverage of the knowledge base. To address this issue, we propose an automated knowledge collection pipeline, using which we collect a large-scale multimodal brand knowledge base, KnowPhish, containing 20k brands with rich information about each brand. KnowPhish can be used to boost the performance of existing RBPDs in a plug-and-play manner. A second limitation of existing RBPDs is that they solely rely on the image modality, ignoring useful textual information present in the webpage HTML. To utilize this textual information, we propose a Large Language Model (LLM)-based approach to extract brand information of webpages from text. Our resulting multimodal phishing detection approach, KnowPhish Detector (KPD), can detect phishing webpages with or without logos. We evaluate KnowPhish and KPD on a manually validated dataset, and a field study under Singapore's local context, showing substantial improvements in effectiveness and efficiency compared to state-of-the-art baselines.
Exploring ChatGPT's Capabilities on Vulnerability Management
Peiyu Liu and Junming Liu, Zhejiang University NGICS Platform; Lirong Fu, Hangzhou Dianzi University; Kangjie Lu, University of Minnesota; Yifan Xia, Zhejiang University NGICS Platform; Xuhong Zhang, Zhejiang University and Jianghuai Advance Technology Center; Wenzhi Chen, Zhejiang University; Haiqin Weng, Ant Group; Shouling Ji, Zhejiang University; Wenhai Wang, Zhejiang University NGICS Platform
Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors. However, it is unclear whether ChatGPT can complete more complicated real-world vulnerability management tasks, such as the prediction of security relevance and patch correctness, which require an all-encompassing understanding of various aspects, including code syntax, program semantics, and related manual comments.
In this paper, we explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. For each task, we compare ChatGPT against SOTA approaches, investigate the impact of different prompts, and explore the difficulties. The results suggest promising potential in leveraging ChatGPT to assist vulnerability management. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Furthermore, our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions. For instance, directly providing random demonstration examples in the prompt cannot consistently guarantee good performance in vulnerability management. By contrast, leveraging ChatGPT in a self-heuristic way—extracting expertise from demonstration examples itself and integrating the extracted expertise in the prompt is a promising research direction. Besides, ChatGPT may misunderstand and misuse the information in the prompt. Consequently, effectively guiding ChatGPT to focus on helpful information rather than the irrelevant content is still an open problem.
Large Language Models for Code Analysis: Do LLMs Really Do Their Job?
Chongzhou Fang, Ning Miao, and Shaurya Srivastav, University of California, Davis; Jialin Liu, Temple University; Ruoyu Zhang, Ruijie Fang, Asmita, Ryan Tsang, and Najmeh Nazari, University of California, Davis; Han Wang, Temple University; Houman Homayoun, University of California, Davis
Large language models (LLMs) have demonstrated significant potential in the realm of natural language understanding and programming code processing tasks. Their capacity to comprehend and generate human-like code has spurred research into harnessing LLMs for code analysis purposes. However, the existing body of literature falls short in delivering a systematic evaluation and assessment of LLMs' effectiveness in code analysis, particularly in the context of obfuscated code.
This paper seeks to bridge this gap by offering a comprehensive evaluation of LLMs' capabilities in performing code analysis tasks. Additionally, it presents real-world case studies that employ LLMs for code analysis. Our findings indicate that LLMs can indeed serve as valuable tools for automating code analysis, albeit with certain limitations. Through meticulous exploration, this research contributes to a deeper understanding of the potential and constraints associated with utilizing LLMs in code analysis, paving the way for enhanced applications in this critical domain.
PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing
Gelei Deng and Yi Liu, Nanyang Technological University; Víctor Mayoral-Vilches, Alias Robotics and Alpen-Adria-Universität Klagenfurt; Peng Liu, Institute for Infocomm Research (I2R), A*STAR, Singapore; Yuekang Li, University of New South Wales; Yuan Xu, Tianwei Zhang, and Yang Liu, Nanyang Technological University; Martin Pinzger, Alpen-Adria-Universität Klagenfurt; Stefan Rass, Johannes Kepler University Linz
Distinguished Artifact Award Winner
Penetration testing, a crucial industrial practice for ensuring system security, has traditionally resisted automation due to the extensive expertise required by human professionals. Large Language Models (LLMs) have shown significant advancements in various domains, and their emergent abilities suggest their potential to revolutionize industries. In this work, we establish a comprehensive benchmark using real-world penetration testing targets and further use it to explore the capabilities of LLMs in this domain. Our findings reveal that while LLMs demonstrate proficiency in specific sub-tasks within the penetration testing process, such as using testing tools, interpreting outputs, and proposing subsequent actions, they also encounter difficulties maintaining a whole context of the overall testing scenario.
Based on these insights, we introduce PENTESTGPT, an LLM-empowered automated penetration testing framework that leverages the abundant domain knowledge inherent in LLMs. PENTESTGPT is meticulously designed with three self-interacting modules, each addressing individual sub-tasks of penetration testing, to mitigate the challenges related to context loss. Our evaluation shows that PENTESTGPT not only outperforms LLMs with a task-completion increase of 228.6% compared to the GPT-3.5 model among the benchmark targets, but also proves effective in tackling real-world penetration testing targets and CTF challenges. Having been open-sourced on GitHub, PENTESTGPT has garnered over 6,500 stars in 12 months and fostered active community engagement, attesting to its value and impact in both the academic and industrial spheres.
Fuzzing I: Software
OptFuzz: Optimization Path Guided Fuzzing for JavaScript JIT Compilers
Jiming Wang and Yan Kang, SKLP, Institute of Computing Technology, CAS & University of Chinese Academy of Sciences; Chenggang Wu, SKLP, Institute of Computing Technology, CAS & University of Chinese Academy of Sciences & Zhongguancun Laboratory; Yuhao Hu, Yue Sun, and Jikai Ren, SKLP, Institute of Computing Technology, CAS & University of Chinese Academy of Sciences; Yuanming Lai and Mengyao Xie, SKLP, Institute of Computing Technology, CAS; Charles Zhang, Tsinghua University; Tao Li, Nankai University; Zhe Wang, SKLP, Institute of Computing Technology, CAS & University of Chinese Academy of Sciences & Zhongguancun Laboratory
Just-In-Time (JIT) compiler is a core component of JavaScript engines, which takes a snippet of JavaScript code as input and applies a series of optimization passes on it and then transforms it to machine code. The optimization passes often have some assumptions (e.g., variable types) on the target JavaScript code, and therefore will yield vulnerabilities if the assumptions do not hold. To discover such bugs, it is essential to thoroughly test different optimization passes, but previous work fails to do so and mainly focused on exploring code coverage. In this paper, we present the first optimization path guided fuzzing solution for JavaScript JIT compilers, namely OptFuzz, which focuses on exploring optimization path coverage. Specifically, we utilize an optimization trunk path metric to approximate the optimization path coverage, and use it as a feedback to guide seed preservation and seed scheduling of the fuzzing process. We have implemented a prototype of OptFuzz and evaluated it on 4 mainstream JavaScript engines. On earlier versions of JavaScript engines, OptFuzz found several times more bugs than baseline solutions. On the latest JavaScript engines, OptFuzz discovered 36 unknown bugs, while baseline solutions found none.
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing
Asmita, University of California, Davis; Yaroslav Oliinyk and Michael Scott, NetRise; Ryan Tsang, Chongzhou Fang, and Houman Homayoun, University of California, Davis
BusyBox, an open-source software bundling over 300 essential Linux commands into a single executable, is ubiquitous in Linux-based embedded devices. Vulnerabilities in BusyBox can have far-reaching consequences, affecting a wide array of devices. This research, driven by the extensive use of BusyBox, delved into its analysis. The study revealed the prevalence of older BusyBox versions in real-world embedded products, prompting us to conduct fuzz testing on BusyBox. Fuzzing, a pivotal software testing method, aims to induce crashes that are subsequently scrutinized to uncover vulnerabilities. Within this study, we introduce two techniques to fortify software testing. The first technique enhances fuzzing by leveraging Large Language Models (LLM) to generate target-specific initial seeds. Our study showed a substantial increase in crashes when using LLM-generated initial seeds, highlighting the potential of LLM to efficiently tackle the typically labor-intensive task of generating target-specific initial seeds. The second technique involves repurposing previously acquired crash data from similar fuzzed targets before initiating fuzzing on a new target. This approach streamlines the time-consuming fuzz testing process by providing crash data directly to the new target before commencing fuzzing. We successfully identified crashes in the latest BusyBox target without conducting traditional fuzzing, emphasizing the effectiveness of LLM and crash reuse techniques in enhancing software testing and improving vulnerability detection in embedded systems. Additionally, manual triaging was performed to identify the nature of crashes in the latest BusyBox.
Towards Generic Database Management System Fuzzing
Yupeng Yang and Yongheng Chen, Georgia Institute of Technology; Rui Zhong, Palo Alto Networks; Jizhou Chen and Wenke Lee, Georgia Institute of Technology
Database Management Systems play an indispensable role in modern cyberspace. While multiple fuzzing frameworks have been proposed in recent years to test relational (SQL) DBMSs to improve their security, non-relational (NoSQL) DBMSs have yet to experience the same scrutiny and lack an effective testing solution in general. In this work, we identify three limitations of existing approaches when extended to fuzz the DBMSs effectively in general: being non-generic, using static constraints, and generating loose data dependencies. Then, we propose effective solutions to address these limitations. We implement our solutions into an end-to-end fuzzing framework, BUZZBEE, which can effectively fuzz both relational and non-relational DBMSs. BUZZBEE successfully discovered 40 vulnerabilities in eight DBMSs of four different data models, of which 25 have been fixed with 4 new CVEs assigned. In our evaluation, BUZZBEE outperforms state-of-the-art generic fuzzers by up to 177% in terms of code coverage and discovers 30x more bugs than the second-best fuzzer for non-relational DBMSs, while achieving comparable results with specialized SQL fuzzers for the relational counterpart.
HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface
Alexander Bulekov, EPFL, Boston University, and Amazon; Qiang Liu, EPFL and Zhejiang University; Manuel Egele, Boston University; Mathias Payer, EPFL
Distinguished Paper Award Winner
The security guarantees of cloud computing depend on the isolation guarantees of the underlying hypervisors. Prior works have presented effective methods for automatically identifying vulnerabilities in hypervisors. However, these approaches are limited in scope. For instance, their implementation is typically hypervisor-specific and limited by requirements for detailed grammars, access to source-code, and assumptions about hypervisor behaviors. In practice, complex closed-source and recent open-source hypervisors are often not suitable for off-the-shelf fuzzing techniques.
HYPERPILL introduces a generic approach for fuzzing arbitrary hypervisors. HYPERPILL leverages the insight that although hypervisor implementations are diverse, all hypervisors rely on the identical underlying hardware-virtualization interface to manage virtual-machines. To take advantage of the hardware-virtualization interface, HYPERPILL makes a snapshot of the hypervisor, inspects the snapshotted hardware state to enumerate the hypervisor's input-spaces, and leverages feedback-guided snapshot-fuzzing within an emulated environment to identify vulnerabilities in arbitrary hypervisors. In our evaluation, we found that beyond being the first hypervisor-fuzzer capable of identifying vulnerabilities in arbitrary hypervisors across all major attack-surfaces (i.e., PIO/MMIO/Hypercalls/DMA), HYPERPILL also outperforms state-of-the-art approaches that rely on access to source-code, due to the granularity of feedback provided by HYPERPILL's emulation-based approach. In terms of coverage, HYPERPILL outperformed past fuzzers for 10/12 QEMU devices, without the API hooking or source-code instrumentation techniques required by prior works. HYPERPILL identified 26 new bugs in recent versions of QEMU, Hyper-V, and macOS Virtualization Framework across four device-categories
Differential Privacy I
Less is More: Revisiting the Gaussian Mechanism for Differential Privacy
Tianxi Ji, Texas Tech University; Pan Li, Case Western Reserve University
Differential privacy (DP) via output perturbation has been a de facto standard for releasing query or computation results on sensitive data. Different variants of the classic Gaussian mechanism have been developed to reduce the magnitude of the noise and improve the utility of sanitized query results. However, we identify that all existing Gaussian mechanisms suffer from the curse of full-rank covariance matrices, and hence the expected accuracy losses of these mechanisms equal the trace of the covariance matrix of the noise. Particularly, for query results with multiple entries, in order to achieve DP, the expected accuracy loss of the classic Gaussian mechanism, that of the analytic Gaussian mechanism, and that of the Matrix-Variate Gaussian (MVG) mechanism are lower bounded by terms that scales linearly with the number of entries.
To lift this curse, we design a Rank-1 Singular Multivariate Gaussian (R1SMG) mechanism. It achieves DP on high dimension query results by perturbing the results with noise following a singular multivariate Gaussian distribution, whose covariance matrix is a randomly generated rank-1 positive semi-definite matrix. In contrast, the classic Gaussian mechanism and its variants all consider deterministic full-rank covariance matrices. Our idea is motivated by a clue from Dwork et al.'s seminal work on the classic Gaussian mechanism that has been ignored in the literature: when projecting multivariate Gaussian noise with a full-rank covariance matrix onto a set of orthonormal basis, only the coefficient of a single basis can contribute to the privacy guarantee.
This paper makes the following technical contributions.
(i) The R1SMG mechanisms achieves DP guarantee on high dimension query results in, while its expected accuracy loss is lower bounded by a term that is on a lower order of magnitude by at least the dimension of query results compared with that of the classic Gaussian mechanism, of the analytic Gaussian mechanism, and of the MVG mechanism.
(ii) Compared with other mechanisms, the R1SMG mechanism is more stable and less likely to generate noise with large magnitude that overwhelms the query results, because the kurtosis and skewness of the nondeterministic accuracy loss introduced by this mechanism is larger than that introduced by other mechanisms.
Relation Mining Under Local Differential Privacy
Kai Dong, Zheng Zhang, Chuang Jia, Zhen Ling, Ming Yang, and Junzhou Luo, Southeast University; Xinwen Fu, University of Massachusetts Lowell
Existing local differential privacy (LDP) techniques enable untrustworthy aggregators to perform only very simple data mining tasks on distributed private data, including statistical estimation and frequent item mining. There is currently no general LDP method that discovers relations between items. The main challenge lies in the curse of dimensionality, as the quantity of values to be estimated in mining relations is the square of the quantity of values to be estimated in mining item-level knowledge, leading to a considerable decrease in the final estimation accuracy. We propose LDP-RM, the first relation mining method under LDP. It represents items and relations in a matrix and utilizes singular value decomposition and low rank approximation to reduce the number of values to estimate from O(k2) to O(r), where k is the number of all considered items, and r < k is a parameter determined by the aggregator, signifying the rank of the approximation. LDP-RM serves as a fundamental privacy-preserving method for enabling various complex data mining tasks.
Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD
Anvith Thudi and Hengrui Jia, University of Toronto and Vector Institute; Casey Meehan, University of California, San Diego; Ilia Shumailov, University of Oxford; Nicolas Papernot, University of Toronto and Vector Institute
Differentially private stochastic gradient descent (DP-SGD) is the canonical approach to private deep learning. While the current privacy analysis of DP-SGD is known to be tight in some settings, several empirical results suggest that models trained on common benchmark datasets leak significantly less privacy for many datapoints. Yet, despite past attempts, a rigorous explanation for why this is the case has not been reached. Is it because there exist tighter privacy upper bounds when restricted to these dataset settings, or are our attacks not strong enough for certain datapoints? In this paper, we provide the first per-instance (i.e., "data-dependent") DP analysis of DP-SGD. Our analysis captures the intuition that points with similar neighbors in the dataset enjoy better data-dependent privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints (when trained on common benchmarks) than the current data-independent guarantee. This implies privacy attacks will necessarily fail against many datapoints if the adversary does not have sufficient control over the possible training datasets.
DPAdapter: Improving Differentially Private Deep Learning through Noise Tolerance Pre-training
Zihao Wang, Rui Zhu, and Dongruo Zhou, Indiana University Bloomington; Zhikun Zhang, Zhejiang University; John Mitchell, Stanford University; Haixu Tang and XiaoFeng Wang, Indiana University Bloomington
Recent developments have underscored the critical role of differential privacy (DP) in safeguarding individual data for training machine learning models. However, integrating DP oftentimes incurs significant model performance degradation due to the perturbation introduced into the training process, presenting a formidable challenge in the differentially private machine learning (DPML) field. To this end, several mitigative efforts have been proposed, typically revolving around formulating new DPML algorithms or relaxing DP definitions to harmonize with distinct contexts. In spite of these initiatives, the diminishment induced by DP on models, particularly large-scale models, remains substantial and thus, necessitates an innovative solution that adeptly circumnavigates the consequential impairment of model utility.
In response, we introduce DPAdapter, a pioneering technique designed to amplify the model performance of DPML algorithms by enhancing parameter robustness. The fundamental intuition behind this strategy is that models with robust parameters are inherently more resistant to the noise introduced by DP, thereby retaining better performance despite the perturbations. DPAdapter modifies and enhances the sharpness-aware minimization (SAM) technique, utilizing a two-batch strategy to provide a more accurate perturbation estimate and an efficient gradient descent, thereby improving parameter robustness against noise. Notably, DPAdapter can act as a plug-and-play component and be combined with existing DPML algorithms to further improve their performance. Our experiments show that DPAdapter vastly enhances state-of-the-art DPML algorithms, increasing average accuracy from 72.92% to 77.09% with a privacy budget of ϵ = 4.
2:45 pm–3:15 pm
Coffee and Tea Break
Grand Ballroom Foyer
3:15 pm–4:15 pm
Deepfake and Synthesis
Double Face: Leveraging User Intelligence to Characterize and Recognize AI-synthesized Faces
Matthew Joslin, Xian Wang, and Shuang Hao, University of Texas at Dallas
Artificial Intelligence (AI) techniques have advanced to generate face images of nonexistent yet photorealistic persons. Despite positive applications, AI-synthesized faces have been increasingly abused to deceive users and manipulate opinions, such as AI-generated profile photos for fake accounts. Deception using generated realistic-appearing images raises severe trust and security concerns. So far, techniques to analyze and recognize AI-synthesized face images are limited, mainly relying on off-the-shelf classification methods or heuristics of researchers' individual perceptions.
As a complement to existing analysis techniques, we develop a novel approach that leverages crowdsourcing annotations to analyze and defend against AI-synthesized face images. We aggregate and characterize AI-synthesis artifacts annotated by multiple users (instead of by individual researchers or automated systems). Our quantitative findings systematically identify where the synthesis artifacts are likely to be located and what characteristics the synthesis patterns have. We further incorporate user annotated regions into an attention learning approach to detect AI-synthesized faces. Our work sheds light on involving human factors to enhance defense against AI-synthesized face images.
SoK: The Good, The Bad, and The Unbalanced: Measuring Structural Limitations of Deepfake Media Datasets
Seth Layton, Tyler Tucker, Daniel Olszewski, Kevin Warren, Kevin Butler, and Patrick Traynor, University of Florida
Deepfake media represents an important and growing threat not only to computing systems but to society at large. Datasets of image, video, and voice deepfakes are being created to assist researchers in building strong defenses against these emerging threats. However, despite the growing number of datasets and the relative diversity of their samples, little guidance exists to help researchers select datasets and then meaningfully contrast their results against prior efforts. To assist in this process, this paper presents the first systematization of deepfake media. Using traditional anomaly detection datasets as a baseline, we characterize the metrics, generation techniques, and class distributions of existing datasets. Through this process, we discover significant problems impacting the comparability of systems using these datasets, including unaccounted-for heavy class imbalance and reliance upon limited metrics. These observations have a potentially profound impact should such systems be transitioned to practice - as an example, we demonstrate that the widely-viewed best detector applied to a typical call center scenario would result in only 1 out of 333 flagged results being a true positive. To improve reproducibility and future comparisons, we provide a template for reporting results in this space and advocate for the release of model score files such that a wider range of statistics can easily be found and/or calculated. Through this, and our recommendations for improving dataset construction, we provide important steps to move this community forward.
Can I Hear Your Face? Pervasive Attack on Voice Authentication Systems with a Single Face Image
Nan Jiang, Bangjie Sun, and Terence Sim, National University of Singapore; Jun Han, KAIST
We present Foice, a novel deepfake attack against voice authentication systems. Foice generates a synthetic voice of the victim from just a single image of the victim's face, without requiring any voice sample. This synthetic voice is realistic enough to fool commercial authentication systems. Since face images are generally easier to obtain than voice samples, Foice effectively makes it easier for an attacker to mount large-scale attacks. The key idea lies in learning the partial correlation between face and voice features and adding to that a face-independent voice feature sampled from a Gaussian distribution. We demonstrate the effectiveness of Foice with a comprehensive set of real-world experiments involving ten offline participants and an online dataset of 1029 unique individuals. By evaluating eight state-of-the-art systems, including WeChat's Voiceprint and Microsoft Azure, we show that all these systems are vulnerable to Foice attack.
dp-promise: Differentially Private Diffusion Probabilistic Models for Image Synthesis
Haichen Wang and Shuchao Pang, Nanjing University of Science and Technology; Zhigang Lu, James Cook University; Yihang Rao and Yongbin Zhou, Nanjing University of Science and Technology; Minhui Xue, CSIRO's Data61
Utilizing sensitive images (e.g., human faces) for training DL models raises privacy concerns. One straightforward solution is to replace the private images with synthetic ones generated by deep generative models. Among all image synthesis methods, diffusion models (DMs) yield impressive performance. Unfortunately, recent studies have revealed that DMs incur privacy challenges due to the memorization of the training instances. To preserve the existence of a single private sample of DMs, many works have explored to apply DP on DMs from different perspectives. However, existing works on differentially private DMs only consider DMs as regular deep models, such that they inject unnecessary DP noise in addition to the forward process noise in DMs, damaging the model utility. To address the issue, this paper proposes Differentially Private Diffusion Probabilistic Models for Image Synthesis, dp-promise, which theoretically guarantees approximate DP by leveraging the DM noise during the forward process. Extensive experiments demonstrate that, given the same privacy budget, dp-promise outperforms the state-of-the-art on the image quality of differentially private image synthesis across the standard metrics and datasets.
Hardware Security II: Architecture and Microarchitecture
DMAAUTH: A Lightweight Pointer Integrity-based Secure Architecture to Defeat DMA Attacks
Xingkai Wang, Wenbo Shen, Yujie Bu, Jinmeng Zhou, and Yajin Zhou, Zhejiang University
IOMMU has been introduced to thwart DMA attacks. However, the performance degradation prevents it from being enabled on most systems. Even worse, recent studies show that IOMMU is still vulnerable to sub-page and deferred invalidation attacks, posing threats to systems with IOMMU enabled.
This paper aims to provide a lightweight and secure solution to defend against DMA attacks. Based on our measurement and characterizing of DMA behavior, we propose DMAAUTH, a lightweight pointer integrity-based hardware-software co-design architecture. DMAAUTH utilizes a novel technique named Arithmetic-capable Pointer AuthentiCation (APAC), which protects the DMA pointer integrity while supporting pointer arithmetic. It also places a dedicated hardware named Authenticator on the bus to authenticate all the DMA transactions. Combining APAC, per-mapping metadata, and the Authenticator, DMAAUTH achieves strict byte-grained spatial protection and temporal protection.
We implement DMAAUTH on a real FPGA hardware board. Specifically, we first realize a PCIe-customizable SoC on real FPGA, based on which we implement hardware version DMAAUTH and conduct a thorough evaluation. We also implement DMAAUTH on both ARM and RISC-V emulators to demonstrate its cross-architecture capability. Our evaluation shows that DMAAUTH is faster and safer than IOMMU while being transparent to devices, drivers, and IOMMU.
Bending microarchitectural weird machines towards practicality
Ping-Lun Wang, Riccardo Paccagnella, Riad S. Wahby, and Fraser Brown, Carnegie Mellon University
A large body of work has demonstrated attacks that rely on the difference between CPUs' nominal instruction set architectures and their actual (microarchitectural) implementations. Most of these attacks, like Spectre, bypass the CPU's data-protection boundaries. A recent line of work considers a different primitive, called a microarchitectural weird machine (µWM), that can execute computations almost entirely using microarchitectural side effects. While µWMs would seem to be an extremely powerful tool, e.g., for obfuscating malware, thus far they have seen very limited application. This is because prior µWMs must be hand-crafted by experts, and even then have trouble reliably executing complex computations.
In this work, we show that µWMs are a practical, near-term threat. First, we design a new µWM architecture, Flexo, that improves performance by 1–2 orders of magnitude and reduces circuit size by 75–87%, dramatically improving the applicability of µWMs to complex computation. Second, we build the first compiler from a high-level language to µWMs, letting experts craft automatic optimizations and non-experts construct state-of-the-art obfuscated computations. Finally, we demonstrate the practicality of our approach by extending the popular UPX packer to encrypt its payload and use a µWM for decryption, frustrating malware analysis.
GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers
Boru Chen, University of Illinois Urbana-Champaign; Yingchen Wang, University of Texas at Austin; Pradyumna Shome, Georgia Institute of Technology; Christopher Fletcher, University of California, Berkeley; David Kohlbrenner, University of Washington; Riccardo Paccagnella, Carnegie Mellon University; Daniel Genkin, Georgia Institute of Technology
Microarchitectural side-channel attacks have shaken the foundations of modern processor design. The cornerstone defense against these attacks has been to ensure that security-critical programs do not use secret-dependent data as addresses. Put simply: do not pass secrets as addresses to, e.g., data memory instructions. Yet, the discovery of data memory-dependent prefetchers (DMPs)—which turn program data into addresses directly from within the memory system—calls into question whether this approach will continue to remain secure.
This paper shows that the security threat from DMPs is significantly worse than previously thought and demonstrates the first end-to-end attacks on security-critical software using the Apple m-series DMP. Undergirding our attacks is a new understanding of how DMPs behave which shows, among other things, that the Apple DMP will activate on behalf of any victim program and attempt to "leak" any cached data that resembles a pointer. From this understanding, we design a new type of chosen-input attack that uses the DMP to perform end-to-end key extraction on popular constant-time implementations of classical (OpenSSL Diffie-Hellman Key Exchange, Go RSA decryption) and post-quantum cryptography (CRYSTALS-Kyber and CRYSTALS-Dilithium).
CacheWarp: Software-based Fault Injection using Selective State Reset
Ruiyi Zhang, Lukas Gerlach, Daniel Weber, and Lorenz Hetterich, CISPA Helmholtz Center for Information Security; Youheng Lü, Independent; Andreas Kogler, Graz University of Technology; Michael Schwarz, CISPA Helmholtz Center for Information Security
AMD SEV is a trusted-execution environment (TEE), providing confidentiality and integrity for virtual machines (VMs). With AMD SEV, it is possible to securely run VMs on an untrusted hypervisor. While previous attacks demonstrated architectural shortcomings of earlier SEV versions, AMD claims that SEV-SNP prevents all attacks on the integrity.
In this paper, we introduce CacheWarp, a new software-based fault attack on AMD SEV-ES and SEV-SNP, exploiting the possibility to architecturally revert modified cache lines of guest VMs to their previous (stale) state. Unlike previous attacks on the integrity, CacheWarp is not mitigated on the newest SEV-SNP implementation, and it does not rely on specifics of the guest VM. CacheWarp only has to interrupt the VM at an attacker-chosen point to invalidate modified cache lines without them being written back to memory. Consequently, the VM continues with architecturally stale data. In 3 case studies, we demonstrate an attack on RSA in the Intel IPP crypto library, recovering the entire private key, logging into an OpenSSH server without authentication, and escalating privileges to root via the sudo binary. While we implement a software-based mitigation proof-of-concept, we argue that mitigations are difficult, as the root cause is in the hardware.
System Security II: OS Kernel
MOAT: Towards Safe BPF Kernel Extension
Hongyi Lu, Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, and Hong Kong University of Science and Technology; Shuai Wang, Hong Kong University of Science and Technology; Yechang Wu and Wanning He, Southern University of Science and Technology; Fengwei Zhang, Southern University of Science and Technology and Research Institute of Trustworthy Autonomous Systems
The Linux kernel extensively uses the Berkeley Packet Filter (BPF) to allow user-written BPF applications to execute in the kernel space. The BPF employs a verifier to check the security of user-supplied BPF code statically. Recent attacks show that BPF programs can evade security checks and gain unauthorized access to kernel memory, indicating that the verification process is not flawless. In this paper, we present MOAT, a system that isolates potentially malicious BPF programs using Intel Memory Protection Keys (MPK). Enforcing BPF program isolation with MPK is not straightforward; MOAT is designed to alleviate technical obstacles, such as limited hardware keys and the need to protect a wide variety of BPF helper functions. We implement MOAT on Linux (ver. 6.1.38), and our evaluation shows that MOAT delivers low-cost isolation of BPF programs under mainstream use cases, such as isolating a BPF packet filter with only 3% throughput loss.
SeaK: Rethinking the Design of a Secure Allocator for OS Kernel
Zicheng Wang, University of Colorado Boulder & Nanjing University; Yicheng Guang, Nanjing University; Yueqi Chen, University of Colorado Boulder; Zhenpeng Lin, Northwestern University; Michael Le, IBM Research; Dang K Le, Northwestern University; Dan Williams, Virginia Tech; Xinyu Xing, Northwestern University; Zhongshu Gu and Hani Jamjoom, IBM Research
In recent years, heap-based exploitation has become the most dominant attack against the Linux kernel. Securing the kernel heap is of vital importance for kernel protection. Though the Linux kernel allocator has some security designs in place to counter exploitation, our analytical experiments reveal that they can barely provide the expected results. This shortfall is rooted in the current strategy of designing secure kernel allocators which insists on protecting every object all the time. Such strategy inherently conflicts with the kernel nature. To this end, we advocate for rethinking the design of secure kernel allocator. In this work, we explore a new strategy which centers around the "atomic alleviation" concept, featuring flexibility and efficiency in design and deployment. Recent advancements in kernel design and research outcomes on exploitation techniques enable us to prototype this strategy in a tool named SeaK. We used real-world cases to thoroughly evaluate SeaK. The results validate that SeaK substantially strengthens heap security, outperforming all existing features, without incurring noticeable performance and memory cost. Besides, SeaK shows excellent scalability and stability in the production scenario.
Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation
Ziyi Guo, Dang K Le, and Zhenpeng Lin, Northwestern University; Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, and Adam Doupé, Arizona State University; Xinyu Xing, Northwestern University
Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation strategies have largely remained unanswered. In this paper, we conduct a systematic investigation into Page Spray, providing an in-depth understanding of this exploitation technique. We introduce a comprehensive exploit model termed the DirtyPage model, elucidating its fundamental principles. Additionally, we conduct a thorough analysis of the root causes underlying Page Spray occurrences within the Linux Kernel. We design an analyzer based on the Page Spray analysis model to identify Page Spray callsites. Subsequently, we evaluate the stability, exploitability, and compatibility of Page Spray through meticulously designed experiments. Finally, we propose mitigation principles for addressing Page Spray and introduce our own lightweight mitigation approach. This research aims to assist security researchers and developers in gaining insights into Page Spray, ultimately enhancing our collective understanding of this emerging exploitation technique and making improvements to community.
SafeFetch: Practical Double-Fetch Protection with Kernel-Fetch Caching
Victor Duta, Mitchel Josephus Aloserij, and Cristiano Giuffrida, Vrije Universiteit Amsterdam
Distinguished Artifact Award Winner
Double-fetch bugs (or vulnerabilities) stem from in-kernel system call execution fetching the same user data twice without proper data (re)sanitization, enabling TOCTTOU attacks and posing a major threat to operating systems security. Existing double-fetch protection systems rely on the MMU to trap on writes to syscall-accessed user pages and provide the kernel with a consistent snapshot of user memory. While this strategy can hinder attacks, it also introduces nontrivial runtime performance overhead due to the cost of trapping/remapping and the coarse (page-granular) write interposition mechanism.
In this paper, we propose SafeFetch, a practical solution to protect the kernel from double-fetch bugs. The key intuition is that most system calls fetch small amounts of user data (if at all), hence caching this data in the kernel can be done at a small performance cost. To this end, SafeFetch creates per-syscall caches to persist fetched user data and replay them when they are fetched again within the same syscall. This strategy neutralizes all double-fetch bugs, while eliminating trapping/remapping overheads and relying on efficient byte-granular interposition. Our Linux prototype evaluation shows SafeFetch can provide comprehensive protection with low performance overheads (e.g., 4.4% geomean on LMBench), significantly outperforming state-of-the-art solutions.
Network Security II: Attacks
LanDscAPe: Exploring LDAP Weaknesses and Data Leaks at Internet Scale
Jonas Kaspereit and Gurur Öndarö, Münster University of Applied Sciences; Gustavo Luvizotto Cesar, University of Twente; Simon Ebbers, Münster University of Applied Sciences; Fabian Ising, Fraunhofer SIT and National Research Center for Applied Cybersecurity ATHENE; Christoph Saatjohann, Münster University of Applied Sciences, Fraunhofer SIT, and National Research Center for Applied Cybersecurity ATHENE; Mattijs Jonker, University of Twente; Ralph Holz, University of Twente and University of Münster; Sebastian Schinzel, Münster University of Applied Sciences, Fraunhofer SIT, and National Research Center for Applied Cybersecurity ATHENE
The Lightweight Directory Access Protocol (LDAP) is the standard technology to query information stored in directories. These directories can contain sensitive personal data such as usernames, email addresses, and passwords. LDAP is also used as a central, organization-wide storage of configuration data for other services. Hence, it is important to the security posture of many organizations, not least because it is also at the core of Microsoft's Active Directory, and other identity management and authentication services.
We report on a large-scale security analysis of deployed LDAP servers on the Internet. We developed LanDscAPe, a scanning tool that analyzes security-relevant misconfigurations of LDAP servers and the security of their TLS configurations. Our Internet-wide analysis revealed more than 10k servers that appear susceptible to a range of threats, including insecure configurations, deprecated software with known vulnerabilities, and insecure TLS setups. 4.9k LDAP servers host personal data, and 1.8k even leak passwords. We document, classify, and discuss these and briefly describe our notification campaign to address these concerning issues.
FakeBehalf: Imperceptible Email Spoofing Attacks against the Delegation Mechanism in Email Systems
Jinrui Ma, Lutong Chen, and Kaiping Xue, University of Science and Technology of China; Bo Luo, The University of Kansas; Xuanbo Huang, Mingrui Ai, and Huanjie Zhang, University of Science and Technology of China; David S.L. Wei, Fordham University; Yan Zhuang, University of Science and Technology of China
Email has become an essential service for global communication.In email protocols, a Delegation Mechanism allows emails to be sent by other entities on behalf of the email author. Specifically, the Sender field indicates the agent for email delivery (i.e., the Delegate). Despite well-implemented security extensions (e.g., DKIM, DMARC) that validate the authenticity of email authors, vulnerabilities in the Delegation Mechanism can still be exploited to bypass these security measures with well-crafted spoofing emails.
This paper systematically analyzes the security vulnerabilities within the Delegation Mechanism. Due to the absence of validation for the Sender field, adversaries can arbitrarily fabricate this field, thus spoofing the Delegate presented to email recipients. Our observations reveal that emails with a spoofed Sender field can pass authentications and reach the inboxes of all target providers. We also conduct a user study with 50 participants to assess the recipients' comprehension of spoofed Delegates, finding that 50% are susceptible to deceiving Delegate information. Furthermore, we propose novel email spoofing attacks where adversaries can impersonate arbitrary entities as email authors to craft highly deceptive emails while passing security extensions. We assess their impact across 16 service providers and 20 clients, observing that half of the providers and all clients are vulnerable to the discovered attacks. To mitigate the threats within the Delegation Mechanism, we propose a validation scheme to verify the authenticity of the Sender field, along with design suggestions to enhance the security of email clients.
Rethinking the Security Threats of Stale DNS Glue Records
Yunyi Zhang, National University of Defense Technology and Tsinghua University; Baojun Liu, Tsinghua University; Haixin Duan, Tsinghua University, Zhongguancun Laboratory, and Quan Cheng Laboratory; Min Zhang, National University of Defense Technology; Xiang Li, Tsinghua University; Fan Shi and Chengxi Xu, National University of Defense Technology; Eihal Alowaisheq, King Saud University
The Domain Name System (DNS) fundamentally relies on glue records to provide authoritative nameserver IP addresses, enabling essential in-domain delegation. While previous studies have identified potential security risks associated with glue records, the exploitation of these records, especially in the context of out-domain delegation, remains unclear due to their inherently low trust level and the diverse ways in which resolvers handle them. This paper undertakes the first systematic exploration of the potential threats posed by DNS glue records, uncovering significant real-world security risks. We empirically identify that 23.18% of glue records across 1,096 TLDs are outdated yet still served in practice. More concerningly, through reverse engineering 9 mainstream DNS implementations (e.g., BIND 9 and Microsoft DNS), we reveal manipulable behaviors associated with glue records. The convergence of these systemic issues allows us to propose the novel threat model that could enable large-scale domain hijacking and denial-of-service attacks. Furthermore, our analysis determines over 193,558 exploitable records exist, placing more than 6 million domains at risk. Additional measurement studies on global open resolvers demonstrate that 90% of them use unvalidated and outdated glue records, including OpenDNS and AliDNS. Our responsible disclosure has already prompted mitigation efforts by affected stakeholders. Microsoft DNS, PowerDNS, OpenDNS, and Alibaba Cloud DNS have acknowledged our reported vulnerability. In summary, this work highlights that glue records constitute a forgotten foundation of DNS architecture requiring renewed security prioritization
EVOKE: Efficient Revocation of Verifiable Credentials in IoT Networks
Carlo Mazzocca, University of Bologna; Abbas Acar and Selcuk Uluagac, Cyber-Physical Systems Security Lab, Florida International University; Rebecca Montanari, University of Bologna
The lack of trust is one of the major factors that hinder collaboration among Internet of Things (IoT) devices and harness the usage of the vast amount of data generated. Traditional methods rely on Public Key Infrastructure (PKI), managed by centralized certification authorities (CAs), which suffer from scalability issues, single points of failure, and limited interoperability. To address these concerns, Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) have been proposed by the World Wide Web Consortium (W3C) and the European Union as viable solutions for promoting decentralization and "electronic IDentification, Authentication, and trust Services" (eIDAS). Nevertheless, at the state-of-the-art, there are no efficient revocation mechanisms for VCs specifically tailored for IoT devices, which are characterized by limited connectivity, storage, and computational power.
This paper presents EVOKE, an efficient revocation mechanism of VCs in IoT networks. EVOKE leverages an ECC-based accumulator to manage VCs with minimal computing and storage overhead while offering additional features like mass and offline revocation. We designed, implemented, and evaluated a prototype of EVOKE across various deployment scenarios. Our experiments on commodity IoT devices demonstrate that each device only requires minimal storage (i.e., approximately 1.5 KB) to maintain verification information, and most notably half the storage required by the most efficient PKI certificates. Moreover, our experiments on hybrid networks, representing typical IoT protocols (e.g., Zigbee), also show minimal latency in the order of milliseconds. Finally, our large-scale analysis demonstrates that even when 50% of devices missed updates, approximately 96% of devices in the entire network were updated within the first hour, proving the scalability of EVOKE in offline updates.
ML II: Fault Injection and Robustness
DNN-GP: Diagnosing and Mitigating Model's Faults Using Latent Concepts
Shuo Wang, Shanghai Jiao Tong University; Hongsheng Hu, CSIRO's Data61; Jiamin Chang, University of New South Wales and CSIRO's Data61; Benjamin Zi Hao Zhao, Macquarie University; Qi Alfred Chen, University of California, Irvine; Minhui Xue, CSIRO's Data61
Despite the impressive capabilities of Deep Neural Networks (DNN), these systems remain fault-prone due to unresolved issues of robustness to perturbations and concept drift. Existing approaches to interpreting faults often provide only low-level abstractions, while struggling to extract meaningful concepts to understand the root cause. Furthermore, these prior methods lack integration and generalization across multiple types of faults. To address these limitations, we present a fault diagnosis tool (akin to a General Practitioner) DNN-GP, an integrated interpreter designed to diagnose various types of model faults through the interpretation of latent concepts. DNN-GP incorporates probing samples derived from adversarial attacks, semantic attacks, and samples exhibiting drifting issues to provide a comprehensible interpretation of a model's erroneous decisions. Armed with an awareness of the faults, DNN-GP derives countermeasures from the concept space to bolster the model's resilience. DNN-GP is trained once on a dataset and can be transferred to provide versatile, unsupervised diagnoses for other models, and is sufficiently general to effectively mitigate unseen attacks. DNN-GP is evaluated on three real-world datasets covering both attack and drift scenarios to demonstrate state-to the-art detection accuracy (near 100%) with low false positive rates (<5%).
Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection
Shaofeng Li, Peng Cheng Laboratory; Xinyu Wang, Shanghai Jiao Tong University; Minhui Xue, CSIRO's Data61; Haojin Zhu, Shanghai Jiao Tong University; Zhi Zhang, University of Western Australia; Yansong Gao, CSIRO's Data61; Wen Wu, Peng Cheng Laboratory; Xuemin (Sherman) Shen, University of Waterloo
Distinguished Paper Award Winner
We propose, FrameFlip, a novel attack for depleting DNN model inference with runtime code fault injections. Notably, Frameflip operates independently of the DNN models deployed and succeeds with only a single bit-flip injection. This fundamentally distinguishes it from the existing DNN inference depletion paradigm that requires injecting tens of deterministic faults concurrently. Since our attack performs at the universal code or library level, the mandatory code snippet can be perversely called by all mainstream machine learning frameworks, such as PyTorch and TensorFlow, dependent on the library code. Using DRAM Rowhammer to facilitate end-to-end fault injection, we implement Frameflip across diverse model architectures (LeNet, VGG-16, ResNet-34 and ResNet-50) with different datasets (FMNIST, CIFAR-10, GTSRB, and ImageNet). With a single bit fault injection, Frameflip achieves high depletion efficacy that consistently renders the model inference utility as no better than guessing. We also experimentally verify that identified vulnerable bits are almost equally effective at depleting different deployed models. In contrast, transferability is unattainable for all existing state-of-the-art model inference depletion attacks. Frameflip is shown to be evasive against all known defenses, generally due to the nature of current defenses operating at the model level (which is model-dependent) in lieu of the underlying code level.
Tossing in the Dark: Practical Bit-Flipping on Gray-box Deep Neural Networks for Runtime Trojan Injection
Zihao Wang, Di Tang, and XiaoFeng Wang, Indiana University Bloomington; Wei He, Zhaoyang Geng, and Wenhao Wang, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences
Although Trojan attacks on deep neural networks (DNNs) have been extensively studied, the threat of run-time Trojan injection has only recently been brought to attention. Unlike data poisoning attacks that target the training stage of a DNN model, a run-time attack executes an exploit such as Rowhammer on memory to flip the bits of the target model and thereby implant a Trojan. This threat is stealthier but more challenging, as it requires flipping a set of bits in the target model to introduce an effective Trojan without noticeably downgrading the model's accuracy. This has been achieved only under the less realistic assumption that the target model is fully shared with the adversary through memory, thus enabling them to flip bits across all model layers, including the last few layers.
For the first time, we have investigated run-time Trojan Injection under a more realistic gray-box scenario. In this scenario, a model is perceived in an encoder-decoder manner: the encoder is public and shared through memory, while the decoder is private and so considered to be black-box and inaccessible to unauthorized parties. To address the unique challenge posed by the black-box decoder to Trojan injection in this scenario, we developed a suite of innovative techniques. Using these techniques, we constructed our gray-box attack, Groan, which stands out as both effective and stealthy. Our experiments show that Groan is capable of injecting a highly effective Trojan into the target model, while also largely preserving its performance, even in the presence of state-of-theart memory protection.
Forget and Rewire: Enhancing the Resilience of Transformer-based Models against Bit-Flip Attacks
Najmeh Nazari, Hosein Mohammadi Makrani, and Chongzhou Fang, University of California, Davis; Hossein Sayadi, California State University, Long Beach; Setareh Rafatirad, University of California, Davis; Khaled N. Khasawneh, George Mason University; Houman Homayoun, University of California, Davis
Bit-Flip Attacks (BFAs) involve adversaries manipulating a model's parameter bits to undermine its accuracy significantly. They typically target the most vulnerable parameters, causing maximal damage with minimal bit-flips. While BFAs' impact on Deep Neural Networks (DNNs) is well-studied, their effects on Large Language Models (LLMs) and Vision Transformers (ViTs) have not received the same attention. Inspired by "brain rewiring," we explore enhancing Transformers' resilience against such attacks. This potential lies in the unique architecture of transformer-based models, particularly their Linear layers. Our novel approach, called Forget and Rewire (FaR), strategically applies rewiring to Linear layers to obfuscate neuron connections. By redistributing tasks from critical to non-essential neurons, we reduce the model's sensitivity to specific parameters while preserving its core functionality. This strategy thwarts adversaries' attempts to identify and target crucial parameters using gradient-based algorithms. Our approach conceals pivotal parameters and enhances robustness against random attacks. Comprehensive evaluations across widely used datasets and Transformer frameworks show that the FaR mechanism significantly reduces BFA success rates by 1.4 to 4.2 times with minimal accuracy loss (less than 2%).
Security Analysis II: Program Analysis
What IF Is Not Enough? Fixing Null Pointer Dereference With Contextual Check
Yunlong Xing, Shu Wang, Shiyu Sun, Xu He, and Kun Sun, George Mason University; Qi Li, Tsinghua University
Null pointer dereference (NPD) errors pose the risk of unexpected behavior and system instability, potentially leading to abrupt program termination due to exceptions or segmentation faults. When generating NPD fixes, all existing solutions are confined to the function level fixes and ignore the valuable intraprocedural and interprocedural contextual information, potentially resulting in incorrect patches. In this paper, we introduce CONCH, a novel approach that addresses the challenges of generating correct fixes for NPD issues by incorporating contextual checks. Our method first constructs an NPD context graph to maintain the semantics related to patch generation. Then we summarize distinct fixing position selection policies based on the distribution of the error positions, ensuring the resolution of bugs without introducing duplicate code. Next, the intraprocedural state retrogression builds the if condition, retrogresses the local resources, and constructs return statements as an initial patch. Finally, we conduct interprocedural state propagation to assess the correctness of the initial patch in the entire call chain. We evaluate the effectiveness of CONCH over two real-world datasets. The experimental results demonstrate that CONCH outperforms the SOTA methods and yields over 85% accurate patches.
Unleashing the Power of Type-Based Call Graph Construction by Using Regional Pointer Information
Yuandao Cai, Yibo Jin, and Charles Zhang, The Hong Kong University of Science and Technology
When dealing with millions of lines of C code, we still cannot have the cake and eat it: type analysis for call graph construction is scalable yet highly imprecise. We address this precision issue through a practical observation: many function pointers are simple; they are not referenced by other pointers, nor do they derive their values by dereferencing other pointers. As a result, simple function pointers can be resolved with precise and affordable pointer aliasing information. In this work, we advocate Kelp with two concerted stages. First, instead of directly using type analysis, Kelp performs regional pointer analysis along def-use chains to early and precisely resolve the indirect calls through simple function pointers. Second, Kelp then leverages type analysis to handle the remaining indirect calls. The first stage is efficient as Kelp selectively reasons about simple function pointers, thereby avoiding prohibitive performance penalties. The second stage is precise as the candidate address-taken functions for checking type compatibility are largely reduced thanks to the first stage. Our experiments on twenty large-scale and popular software programs show that, on average, Kelp can reduce spurious callees by 54.2% with only a negligible additional time cost of 8.5% (equivalent to 6.3 seconds) compared to the previous approach. More excitingly, when evaluating the call graphs through the lens of three various downstream clients (i.e., thread-sharing analysis, value-flow bug detection, and directed grey-box fuzzing), Kelp can significantly enhance their effectiveness for better vulnerability understanding, hunting, and reproduction.
Practical Data-Only Attack Generation
Brian Johannesmeyer, Asia Slowinska, Herbert Bos, and Cristiano Giuffrida, Vrije Universiteit Amsterdam
As control-flow hijacking is getting harder due to increasingly sophisticated CFI solutions, recent work has instead focused on automatically building data-only attacks, typically using symbolic execution, simplifying assumptions that do not always match the attacker's goals, manual gadget chaining, or all of the above. As a result, the practical adoption of such methods is minimal. In this work, we abstract away unnecessary complexities and instead use a lightweight approach that targets the vulnerabilities that are both the most tractable for analysis, and the most promising for an attacker.
In particular, we present Einstein, a data-only attack exploitation pipeline that uses dynamic taint analysis policies to: (i) scan for chains of vulnerable system calls (e.g., to execute code or corrupt the filesystem), and (ii) generate exploits for those that take unmodified attacker data as input. Einstein discovers thousands of vulnerable syscalls in common server applications—well beyond the reach of existing approaches. Moreover, using nginx as a case study, we use Einstein to generate 944 exploits, and we discuss two such exploits that bypass state-of-the-art mitigations.
Don't Waste My Efforts: Pruning Redundant Sanitizer Checks by Developer-Implemented Type Checks
Yizhuo Zhai, Zhiyun Qian, Chengyu Song, Manu Sridharan, and Trent Jaeger, University of California, Riverside; Paul Yu, U.S. Army Research Laboratory; Srikanth V. Krishnamurthy, University of California, Riverside
Type confusion occurs when C or C++ code accesses an object after casting it to an incompatible type. The security impacts of type confusion vulnerabilities are significant, potentially leading to system crashes or even arbitrary code execution. To mitigate these security threats, both static and dynamic approaches have been developed to detect type confusion bugs. However, static approaches can suffer from excessive false positives, while existing dynamic approaches track type information for each object to enable safety checking at each cast, introducing a high runtime overhead.
In this paper, we present a novel tool T-PRUNIFY to reduce the overhead of dynamic type confusion sanitizers. We observe that in large complex C++ projects, to prevent type confusion bugs, developers often add their own encoding of runtime type information (RTTI) into classes, to enable efficient runtime type checks before casts. T-PRUNIFY works by first identifying these custom RTTI in classes, automatically determining the relationship between field and method return values and the concrete types of corresponding objects. Based on these custom RTTI, T-PRUNIFY can identify cases where a cast is protected by developer-written type checks that guarantee the safety of the cast. Consequently, it can safely remove sanitizer instrumentation for such casts, reducing performance overhead. We evaluate T-PRUNIFY based on HexType, a state-of-the-art type confusion sanitizer that supports extensive C++ projects such as Google Chrome. Our findings demonstrate that our method significantly lowers HexType's average overhead by 25% to 75% in large C++ programs, marking a substantial enhancement in performance.
Zero-Knowledge Proof I
Two Shuffles Make a RAM: Improved Constant Overhead Zero Knowledge RAM
Yibin Yang, Georgia Institute of Technology; David Heath, University of Illinois Urbana-Champaign
We optimize Zero Knowledge (ZK) proofs of statements expressed as RAM programs over arithmetic values. Our arithmetic-circuit-based read/write memory uses only 4 input gates and 6 multiplication gates per memory access. This is an almost 3× total gate improvement over prior state of the art (Delpech de Saint Guilhem et al., SCN'22).
We implemented our memory in the context of ZK proofs based on vector oblivious linear evaluation (VOLE), and we further optimized based on techniques available in the VOLE setting. Our experiments show that (1) our total runtime improves over that of the prior best VOLE-ZK RAM (Franzese et al., CCS'21) by 2-20× and (2) on a typical hardware setup, we can achieve ≈ 600K RAM accesses per second.
We also develop improved read-only memory and set ZK data structures. These are used internally in our read/write memory and improve over prior work.
Notus: Dynamic Proofs of Liabilities from Zero-knowledge RSA Accumulators
Jiajun Xin, Arman Haghighi, Xiangan Tian, and Dimitrios Papadopoulos, The Hong Kong University of Science and Technology
Proofs of Liabilities (PoL) allow an untrusted prover to commit to its liabilities towards a set of users and then prove independent users' amounts or the total sum of liabilities, upon queries by users or third-party auditors. This application setting is highly dynamic. User liabilities may increase/decrease arbitrarily and the prover needs to update proofs in epoch increments (e.g., once a day for a crypto-asset exchange platform). However, prior works mostly focus on the static case and trivial extensions to the dynamic setting open the system to windows of opportunity for the prover to under-report its liabilities and rectify its books in time for the next check, unless all users check their liabilities at all epochs. In this work, we develop Notus, the first dynamic PoL system for general liability updates that avoids this issue. Moreover, it achieves O(1) query proof size, verification time, and auditor overhead-per-epoch. The core building blocks underlying Notus are a novel zero-knowledge (and SNARK-friendly) RSA accumulator and a corresponding zero-knowledge MultiSwap protocol, which may be of independent interest. We then propose optimizations to reduce the prover's update overhead and make Notus scale to large numbers of users (10^6 in our experiments). Our results are very encouraging, e.g., it takes less than 2ms to verify a user's liability and the proof size is 256 Bytes. On the prover side, deploying Notus on a cloud-based testbed with 256 cores and exploiting parallelism, it takes about 3 minutes to perform the complete epoch update, after which all proofs have already been computed.
Practical Security Analysis of Zero-Knowledge Proof Circuits
Hongbo Wen, University of California, Santa Barbara; Jon Stephens, The University of Texas at Austin and Veridise; Yanju Chen, University of California, Santa Barbara; Kostas Ferles, Veridise; Shankara Pailoor, The University of Texas at Austin and Veridise; Kyle Charbonnet, Ethereum Foundation; Isil Dillig, The University of Texas at Austin and Veridise; Yu Feng, University of California, Santa Barbara, and Veridise
As privacy-sensitive applications based on zero-knowledge proofs (ZKPs) gain increasing traction, there is a pressing need to detect vulnerabilities in ZKP circuits. This paper studies common vulnerabilities in Circom (the most popular domain-specific language for ZKP circuits) and describes a static analysis framework for detecting these vulnerabilities. Our technique operates over an abstraction called the circuit dependence graph (CDG) that captures key properties of the circuit and allows expressing semantic vulnerability patterns as queries over the CDG abstraction. We have implemented 9 different detectors using this framework and performed an experimental evaluation on over 258 circuits from popular Circom projects on GitHub. According to our evaluation, these detectors can identify vulnerabilities, including previously unknown ones, with high precision and recall.
Formalizing Soundness Proofs of Linear PCP SNARKs
Bolton Bailey and Andrew Miller, University of Illinois at Urbana-Champaign
Succinct Non-interactive Arguments of Knowledge (SNARKs) have seen interest and development from the cryptographic community over recent years, and there are now constructions with very small proof size designed to work well in practice. A SNARK protocol can only be widely accepted as secure, however, if a rigorous proof of its security properties has been vetted by the community. Even then, it is sometimes the case that these security proofs are flawed, and it is then necessary for further research to identify these flaws and correct the record.
To increase the rigor of these proofs, we create a formal framework in the Lean theorem prover for representing a widespread subclass of SNARKs based on linear PCPs. We then describe a decision procedure for checking the soundness of SNARKs in this class. We program this procedure and use it to formalize the soundness proof of several different SNARK constructions, including the well-known Groth '16.
4:15 pm–4:30 pm
Short Break
Grand Ballroom Foyer
4:30 pm–5:30 pm
Measurement I: Fraud and Malware and Spam
Guardians of the Galaxy: Content Moderation in the InterPlanetary File System
Saidu Sokoto, City, University of London; Leonhard Balduf, TU Darmstadt; Dennis Trautwein, University of Göttingen; Yiluo Wei and Gareth Tyson, Hong Kong Univ. of Science & Technology (GZ); Ignacio Castro, Queen Mary, University of London; Onur Ascigil, Lancaster University; George Pavlou, University College London; Maciej Korczyński, Univ. Grenoble Alpes; Björn Scheuermann, TU Darmstadt; Michał Król, City, University of London
The InterPlanetary File System (IPFS) is one of the largest platforms in the growing "Decentralized Web". The increasing popularity of IPFS has attracted large volumes of users and content. Unfortunately, some of this content could be considered "problematic". Content moderation is always hard. With a completely decentralized infrastructure and administration, content moderation in IPFS is even more difficult. In this paper, we examine this challenge. We identify, characterize, and measure the presence of problematic content in IPFS (e.g. subject to takedown notices). Our analysis covers 368,762 files. We analyze the complete content moderation process including how these files are flagged, who hosts and retrieves them. We also measure the efficacy of the process. We analyze content submitted to denylist, showing that notable volumes of problematic content are served, and the lack of a centralized approach facilitates its spread. While we identify fast reactions to takedown requests, we also test the resilience of multiple gateways and show that existing means to filter problematic content can be circumvented. We end by proposing improvements to content moderation that result in 227% increase in the detection of phishing content and reduce the average time to filter such content by 43%.
True Attacks, Attack Attempts, or Benign Triggers? An Empirical Measurement of Network Alerts in a Security Operations Center
Limin Yang, Zhi Chen, Chenkai Wang, Zhenning Zhang, and Sushruth Booma, University of Illinois at Urbana-Champaign; Phuong Cao, NCSA; Constantin Adam, IBM Research; Alexander Withers, NCSA; Zbigniew Kalbarczyk, Ravishankar K. Iyer, and Gang Wang, University of Illinois at Urbana-Champaign
Security Operations Centers (SOCs) face the key challenge of handling excessive security alerts. While existing works have studied this problem qualitatively via user studies, there is still a lack of quantitative understanding of the impact of excessive alerts and their effectiveness and limitations in capturing true attacks.
In this paper, we fill the gap by working with a real-world SOC and collecting and analyzing their network alert logs over 4 years (115 million alerts, from 2018 to 2022). To further understand how alerts are associated with true attacks, we also obtain the ground truth of 227 successful attacks in the past 20 years (11 during the overlapping period). Through analysis, we observe that SOC analysts are facing excessive alerts (24K–134K per day), but only a small percentage of the alerts (0.01%) are associated with true attacks. While the majority of true attacks can be detected within the same day, the post-attack investigation takes much longer time (53 days on average). Furthermore, we observe a significant portion of the alerts are related to "attack attempts'' (attacks that did not lead to true compromises, 27%), and "benign triggers'' (correctly matched security events but had business-justified explanations, 49%). Empirically, we show there are opportunities to use rare/abnormal alert patterns to help isolate signals related to true attacks. Given that enterprise SOCs rarely disclose internal data, this paper helps contextualize SOCs' pain points and refine existing problem definitions.
DARKFLEECE: Probing the Dark Side of Android Subscription Apps
Chang Yue, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Chen Zhong, University of Tampa, USA; Kai Chen and Zhiyu Zhang, Institute of Information Engineering, Chinese Academy of Sciences, China; School of Cyber Security, University of Chinese Academy of Sciences, China; Yeonjoon Lee, Hanyang University, Ansan, Republic of Korea
Fleeceware, a novel category of malicious subscription apps, is increasingly tricking users into expensive subscriptions, leading to substantial financial consequences. These apps' ambiguous nature, closely resembling legitimate subscription apps, complicates their detection in app markets. To address this, our study aims to devise an automated method, named DARKFLEECE, to identify fleeceware through their prevalent use of dark patterns. By recruiting domain experts, we curated the first-ever fleeceware feature library, based on dark patterns extracted from user interfaces (UI). A unique extraction method, which integrates UI elements, layout, and multifaceted extraction rules, has been developed. DARKFLEECE boasts a detection accuracy of 93.43% on our dataset and utilizes Explainable Artificial Intelligence (XAI) to present user-friendly alerts about potential fleeceware risks. When deployed to assess Google Play's app landscape, DARKFLEECE examined 13,597 apps and identified an alarming 75.21% of 589 subscription apps that displayed different levels of fleeceware, totaling around 5 billion downloads. Our results are consistent with user reviews on Google Play. Our detailed exploration into the implications of our results for ethical app developers, app users, and app market regulators provides crucial insights for different stakeholders. This underscores the need for proactive measures against the rise of fleeceware.
Into the Dark: Unveiling Internal Site Search Abused for Black Hat SEO
Yunyi Zhang, National University of Defense Technology; Tsinghua University; Mingxuan Liu, Zhongguancun Laboratory; Baojun Liu, Tsinghua University; Zhongguancun Laboratory; Yiming Zhang, Tsinghua University; Haixin Duan, Tsinghua University; Zhongguancun Laboratory; Min Zhang, National University of Defense Technology; Hui Jiang, Tsinghua University; Baidu Inc; Yanzhe Li, Baidu Inc; Fan Shi, National University of Defense Technology
Internal site Search Abuse Promotion (ISAP) is a prevalent Black Hat Search Engine Optimization (SEO) technique, which exploits the reputation of abused internal search websites with minimal effort. However, ISAP is underappreciated and not systematically understood by the security community. To shed light on ISAP risks, we established a collaboration with Baidu, a leading search engine in China. The key challenge of efficiently detecting ISAP risks stems from the sheer volume of daily search traffic, which involves billions of URLs. To address these efficiency bottlenecks, we introduced a first-of-its-kind lightweight detector utilizing a funnel-like approach, tailored to the unique characteristics of ISAP. This approach allows us to single out 3,222,864 ISAP URLs from 10,209 abused websites from Baidu's traffic data. We found that the businesses most likely to fall prey to this practice are porn and gambling, with two emerging areas: self-promotion for SEO and promotion for anonymous servers. By analyzing Baidu's search logs, we discovered that these malicious websites had reached millions of users in just 4 days. We further evaluated this threat on Google and Bing, thereby confirming the widespread presence of ISAP across various search engines. Moreover, we responsibly disclosed the issue to affected search engines and websites, and actively helped them fix it. In summary, our findings highlight the widespread impact and prevalence of ISAP, emphasizing the urgent need for the security community to prioritize and address such risks.
Side Channel II: RowHammer
ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation
Ataberk Olgun, Yahya Can Tugrul, Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Steve Rhyner, Abdullah Giray Yaglikci, Geraldo F. Oliveira, and Onur Mutlu, ETH Zurich
We introduce ABACuS, a new low-cost hardware-counterbased RowHammer mitigation technique that performance-, energy-, and area-efficiently scales with worsening RowHammer vulnerability. We observe that both benign workloads and RowHammer attacks tend to access DRAM rows with the same row address in multiple DRAM banks at around the same time. Based on this observation, ABACuS's key idea is to use a single shared row activation counter to track activations to the rows with the same row address in all DRAM banks. Unlike state-of-the-art RowHammer mitigation mechanisms that implement a separate row activation counter for each DRAM bank, ABACuS implements fewer counters (e.g., only one) to track an equal number of aggressor rows.
Our comprehensive evaluations show that ABACuS securely prevents RowHammer bitflips at low performance/energy overhead and low area cost. We compare ABACuS to four state-of-the-art mitigation mechanisms. At a nearfuture RowHammer threshold of 1000, ABACuS incurs only 0.58% (0.77%) performance and 1.66% (2.12%) DRAM energy overheads, averaged across 62 single-core (8-core) workloads, requiring only 9.47 KiB of storage per DRAM rank. At the RowHammer threshold of 1000, the best prior lowarea-cost mitigation mechanism incurs 1.80% higher average performance overhead than ABACuS, while ABACuS requires 2.50× smaller chip area to implement. At a future RowHammer threshold of 125, ABACuS performs very similarly to (within 0.38% of the performance of) the best prior performance- and energy-efficient RowHammer mitigation mechanism while requiring 22.72× smaller chip area. We show that ABACuS's performance scales well with the number of DRAM banks. At the RowHammer threshold of 125, ABACuS incurs 1.58%, 1.50%, and 2.60% performance overheads for 16-, 32-, and 64-bank systems across all single-core workloads, respectively. ABACuS is freely and openly available at https://github.com/CMU-SAFARI/ABACuS.
SledgeHammer: Amplifying Rowhammer via Bank-level Parallelism
Ingab Kang, University of Michigan; Walter Wang and Jason Kim, Georgia Tech; Stephan van Schaik and Youssef Tobah, University of Michigan; Daniel Genkin, Georgia Tech; Andrew Kwong, UNC Chapel Hill; Yuval Yarom, Ruhr University Bochum
Rowhammer is a hardware vulnerability in DDR memory by which attackers can perform specific access patterns in their own memory to flip bits in adjacent, uncontrolled rows with- out accessing them. Since its discovery by Kim et. al. (ISCA 2014), Rowhammer attacks have emerged as an alarming threat to numerous security mechanisms.
In this paper, we show that Rowhammer attacks can in fact be more effective when combined with bank-level parallelism, a technique in which the attacker hammers multiple memory banks simultaneously. This allows us to increase the amount of Rowhammer-induced flips 7-fold and significantly speed up prior Rowhammer attacks relying on native code execution.
Furthermore, we tackle the task of mounting browser-based Rowhammer attacks. Here, we develop a self-evicting ver- sion of multi-bank hammering, allowing us to replace clflush instructions with cache evictions. We then develop a novel method for detecting contiguous physical addresses using memory access timings, thereby obviating the need for trans- parent huge pages. Finally, by combining both techniques, we are the first, to our knowledge, to obtain Rowhammer bit flips on DDR4 memory from the Chrome and Firefox browsers running on default Linux configurations, without enabling transparent huge pages.
ZenHammer: Rowhammer Attacks on AMD Zen-based Platforms
Patrick Jattke, Max Wipfli, Flavien Solt, Michele Marazzi, Matej Bölcskei, and Kaveh Razavi, ETH Zurich
AMD has gained a significant market share in recent years with the introduction of the Zen microarchitecture. While there are many recent Rowhammer attacks launched from Intel CPUs, they are completely absent on these newer AMD CPUs due to three non-trivial challenges: 1) reverse engineering the unknown DRAM addressing functions, 2) synchronizing with refresh commands for evading in-DRAM mitigations, and 3) achieving a sufficient row activation throughput. We address these challenges in the design of ZenHammer, the first Rowhammer attack on recent AMD CPUs. ZenHammer reverse engineers DRAM addressing functions despite their non-linear nature, uses specially crafted access patterns for proper synchronization, and carefully schedules flush and fence instructions within a pattern to increase the activation throughput while preserving the access order necessary to bypass in-DRAM mitigations. Our evaluation with ten DDR4 devices shows that ZenHammer finds bit flips on seven and six devices on AMD Zen 2 and Zen 3, respectively, enabling Rowhammer exploitation on current AMD platforms. Furthermore, ZenHammer triggers Rowhammer bit flips on a DDR5 device for the first time.
Go Go Gadget Hammer: Flipping Nested Pointers for Arbitrary Data Leakage
Youssef Tobah, University of Michigan; Andrew Kwong, UNC Chapel Hill; Ingab Kang, University of Michigan; Daniel Genkin, Georgia Tech; Kang G. Shin, University of Michigan
Rowhammer is an increasingly threatening vulnerability that grants an attacker the ability to flip bits in memory without directly accessing them. Despite efforts to mitigate Rowhammer via software and defenses built directly into DRAM modules, more recent generations of DRAM are actually more susceptible to malicious bit-flips than their predecessors. This phenomenon has spawned numerous exploits, showing how Rowhammer acts as the basis for various vulnerabilities that target sensitive structures, such as Page Table Entries (PTEs) or opcodes, to grant control over a victim machine.
However, in this paper, we consider Rowhammer as a more general vulnerability, presenting a novel exploit vector for Rowhammer that targets particular code patterns. We show that if victim code is designed to return benign data to an unprivileged user, and uses nested pointer dereferences, Rowhammer can flip these pointers to gain arbitrary read access in the victim's address space. Furthermore, we identify gadgets present in the Linux kernel, and demonstrate an end-to-end attack that precisely flips a targeted pointer. To do so we developed a number of improved Rowhammer primitives, including kernel memory massaging, Rowhammer synchronization, and testing for kernel flips, which may be of broader interest to the Rowhammer community. Compared to prior works' leakage rate of .3 bits/s, we show that such gadgets can be used to read out kernel data at a rate of 82.6 bits/s.
By targeting code gadgets, this work expands the scope and attack surface exposed by Rowhammer. It is no longer sufficient for software defenses to selectively pad previously exploited memory structures in flip-safe memory, as any victim code that follows the pattern in question must be protected.
Forensics
00SEVen – Re-enabling Virtual Machine Forensics: Introspecting Confidential VMs Using Privileged in-VM Agents
Fabian Schwarz and Christian Rossow, CISPA Helmholtz Center for Information Security
The security guarantees of confidential VMs (e.g., AMD's SEV) are a double-edged sword: Their protection against undesired VM inspection by malicious or compromised cloud operators inherently renders existing VM introspection (VMI) services infeasible. However, considering that these VMs particularly target sensitive workloads (e.g., finance), their customers demand secure forensic capabilities.
In this paper, we enable VM owners to remotely inspect their confidential VMs without weakening the VMs' protection against the cloud platform. In contrast to naïve in-VM memory aggregation tools, our approach (dubbed 00SEVen) is isolated from strong in-VM attackers and thus resistant against kernel-level attacks, and it provides VMI features beyond memory access. 00SEVen leverages the recent intra-VM privilege domains of AMD SEV-SNP—called VMPLs—and extends the QEMU/KVM hypervisor to provide VMPL-aware network I/O and VMI-assisting hypercalls. That way, we can serve VM owners with a protected in-VM forensic agent. The agent provides VM owners with attested remote memory and VM register introspection, secure pausing of the analysis target, and page access traps and function traps, all isolated from the cloud platform (incl. hypervisor) and in-VM rootkits.
WEBRR: A Forensic System for Replaying and Investigating Web-Based Attacks in The Modern Web
Joey Allen, Palo Alto Networks; Zheng Yang, Feng Xiao, and Matthew Landen, Georgia Institute of Technology; Roberto Perdisci, Georgia Institute of Technology and University of Georgia; Wenke Lee, Georgia Institute of Technology
After a sophisticated attack or data breach occurs at an organization, a postmortem forensic analysis must be conducted to reconstruct and understand the root causes of the attack. Unfortunately, the majority of proposed forensic analysis systems rely on system-level auditing, making it difficult to reconstruct and investigate web-based attacks, due to the semantic-gap between system- and web-level semantics. This limited visibility into web-based attacks has recently become increasingly concerning because web-based attacks are commonly employed by nation-state adversaries to penetrate and achieve the initial compromise of an enterprise network. To enable forensic analysts to replay and investigate web-based attacks, we propose WebRR, a novel OS- and device- independent record and replay (RR) forensic auditing system for Chromium-based web browsers. While there exist prior works that focus on web-based auditing, current systems are either record-only or suffer from critical limitations that prevent them from deterministically replaying attacks. WebRR addresses these limitation by introducing a novel design that allows it to record and deterministically replay modern web applications by leveraging JavaScript Execution Unit Partitioning.
Our evaluation demonstrates that WebRR is capable of replaying web-based attacks that fail to replay on prior state-of-the-art systems. Furthermore, we demonstrate that WebRR can replay highly-dynamic modern websites in a deterministic fashion with an average runtime overhead of only 3.44%
AI Psychiatry: Forensic Investigation of Deep Learning Networks in Memory Images
David Oygenblik, Georgia Institute of Technology; Carter Yagemann, Ohio State University; Joseph Zhang, University of Pennsylvania; Arianna Mastali, Georgia Institute of Technology; Jeman Park, Kyung Hee University; Brendan Saltaformaggio, Georgia Institute of Technology
Online learning is widely used in production to refine model parameters after initial deployment. This opens several vectors for covertly launching attacks against deployed models. To detect these attacks, prior work developed black-box and white-box testing methods. However, this has left prohibitive open challenge: how the investigator is supposed to recover the model (uniquely refined on an in-the-field device) for testing in the first place. We propose a novel memory forensic technique, named AiP, which automatically recovers the unique deployment model and rehosts it in a lab environment for investigation. AiP navigates through both main memory and GPU memory spaces to recover complex ML data structures, using recovered Python objects to guide the recovery of lower-level C objects, ultimately leading to the recovery of the uniquely refined model. AiP then rehosts the model within the investigator's device, where the investigator can apply various white-box testing methodologies. We have evaluated AiP using three versions of TensorFlow and PyTorch with the CIFAR-10, LISA, and IMDB datasets. AiP recovered 30 models from main memory and GPU memory with 100% accuracy and rehosted them into a live process successfully.
Cost-effective Attack Forensics by Recording and Correlating File System Changes
Le Yu, Yapeng Ye, Zhuo Zhang, and Xiangyu Zhang, Purdue University
Attack forensics is particularly challenging for systems with restrictive resource constraints, such as IoT systems, because most existing methods entail logging high frequency events in the temporal dimension, which is costly. We propose a novel and cost-effective forensics technique that records information in the spatial dimension. It takes regular file-system snapshots that only record deltas between two timestamps. It infers causality by analyzing and correlating file changes (e.g., through methods similar to information retrieval). We show that in practice the resulting provenance graphs are as informative as the traditional attack provenance graphs based on temporal event logging. In the context of IoT attacks, they are better than those by existing techniques. In addition, our runtime and space overheads are only 8.08% and 5.13% of those for the state-of-the-arts, respectively.
ML for Security
Automated Large-Scale Analysis of Cookie Notice Compliance
Ahmed Bouhoula, Karel Kubicek, Amit Zac, Carlos Cotrini, and David Basin, ETH Zurich
Privacy regulations such as the General Data Protection Regulation (GDPR) require websites to inform EU-based users about non-essential data collection and to request their consent to this practice. Previous studies have documented widespread violations of these regulations. However, these studies provide a limited view of the general compliance picture: they are either restricted to a subset of notice types, detect only simple violations using prescribed patterns, or analyze notices manually. Thus, they are restricted both in their scope and in their ability to analyze violations at scale.
We present the first general, automated, large-scale analysis of cookie notice compliance. Our method interacts with cookie notices, e.g., by navigating through their settings. It observes declared processing purposes and available consent options using Natural Language Processing and compares them to the actual use of cookies. By virtue of the generality and scale of our analysis, we correct for the selection bias present in previous studies focusing on specific Consent Management Platforms (CMP). We also provide a more general view of the overall compliance picture using a set of 97k websites popular in the EU. We report, in particular, that 65.4% of websites offering a cookie rejection option likely collect user data despite explicit negative consent.
Detecting and Mitigating Sampling Bias in Cybersecurity with Unlabeled Data
Saravanan Thirumuruganathan, Independent Researcher; Fatih Deniz, Issa Khalil, and Ting Yu, Qatar Computing Research Institute, HBKU; Mohamed Nabeel, Palo Alto Networks; Mourad Ouzzani, Qatar Computing Research Institute, HBKU
Machine Learning (ML) based systems have demonstrated remarkable success in addressing various challenges within the ever-evolving cybersecurity landscape, particularly in the domain of malware detection/classification. However, a notable performance gap becomes evident when such classifiers are deployed in production. This discrepancy, often observed between accuracy scores reported in research papers and their real-world deployments, can be largely attributed to sampling bias. Intuitively, the data distribution in the production differs from that of training resulting in reduced performance of the classifier. How to deal with such sampling bias is an important problem in cybersecurity practice. In this paper, we propose principled approaches to detect and mitigate the adverse effects of sampling bias. First, we propose two simple and intuitive algorithms based on domain discrimination and distribution of k-th nearest neighbor distance to detect discrepancies between training and production data distributions. Second, we propose two algorithms based on the self-training paradigm to alleviate the impact of sampling bias. Our approaches are inspired by domain adaptation and judiciously harness the unlabeled data for enhancing the generalizability of ML classifiers. Critically, our approach does not require any modifications to the classifiers themselves, thus ensuring seamless integration into existing deployments. We conducted extensive experiments on four diverse datasets from malware, web domains, and intrusion detection. In an adversarial setting with large sampling bias, our proposed algorithms can improve the F-score by as much as 10-16 percentage points. Concretely, the F-score of a malware classifier on AndroZoo dataset increases from 0.83 to 0.937.
Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection
Haojie He, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; Xingwei Lin, Ant Group; Ziang Weng and Ruijie Zhao, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; Shuitao Gan, Laboratory for Advanced Computing and Intelligence Engineering; Libo Chen, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University; Yuede Ji, University of North Texas; Jiashui Wang, Ant Group; Zhi Xue, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University
Binary code similarity detection (BCSD) has garnered significant attention in recent years due to its crucial role in various binary code-related tasks, such as vulnerability search and software plagiarism detection. Currently, BCSD systems are typically based on either instruction streams or control flow graphs (CFGs). However, these approaches have limitations. Instruction stream-based approaches treat binary code as natural languages, overlooking well-defined semantic structures. CFG-based approaches exploit only the control flow structures, neglecting other essential aspects of code. Our key insight is that unlike natural languages, binary code has well-defined semantic structures, including intra-instruction structures, inter-instruction relations (e.g., def-use, branches), and implicit conventions (e.g. calling conventions). Motivated by that, we carefully examine the necessary relations and structures required to express the full semantics and expose them directly to the deep neural network through a novel semantics-oriented graph representation. Furthermore, we propose a lightweight multi-head softmax aggregator to effectively and efficiently fuse multiple aspects of the binary code. Extensive experiments show that our method significantly outperforms the state-of-the-art (e.g., in the x64-XC retrieval experiment with a pool size of 10000, our method achieves a recall score of 184%, 220%, and 153% over Trex, GMN, and jTrans, respectively).
VulSim: Leveraging Similarity of Multi-Dimensional Neighbor Embeddings for Vulnerability Detection
Samiha Shimmi, Ashiqur Rahman, and Mohan Gadde, Northern Illinois University; Hamed Okhravi, MIT Lincoln Laboratory; Mona Rahimi, Northern Illinois University
Despite decades of research in vulnerability detection, vulnerabilities in source code remain a growing problem, and more effective techniques are needed in this domain. To enhance software vulnerability detection, in this paper, we first show that various vulnerability classes in the C programming language share common characteristics, encompassing semantic, contextual, and syntactic properties. We then leverage this knowledge to enhance the learning process of Deep Learning (DL) models for vulnerability detection when only sparse data is available. To achieve this, we extract multiple dimensions of information from the available, albeit limited, data. We then consolidate this information into a unified space, allowing for the identification of similarities among vulnerabilities through nearest-neighbor embeddings. The combination of these steps allows us to improve the effectiveness and efficiency of vulnerability detection using DL models. Evaluation results demonstrate that our approach surpasses existing State-of-the-art (SOTA) models and exhibits strong performance on unseen data, thereby enhancing generalizability.
LLM I: Attack and Defense
An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection
Shenao Yan, University of Connecticut; Shen Wang and Yue Duan, Singapore Management University; Hanbin Hong, University of Connecticut; Kiho Lee and Doowon Kim, University of Tennessee, Knoxville; Yuan Hong, University of Connecticut
Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CodeBreaker, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CodeBreaker leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CodeBreaker stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CodeBreaker across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CodeBreaker challenges current security measures, underscoring the critical need for more robust defenses for code completion.
REMARK-LLM: A Robust and Efficient Watermarking Framework for Generative Large Language Models
Ruisi Zhang, Shehzeen Samarah Hussain, Paarth Neekhara, and Farinaz Koushanfar, University of California, San Diego
We present REMARK-LLM, a novel efficient, and robust watermarking framework designed for texts generated by large language models (LLMs). Synthesizing human-like content using LLMs necessitates vast computational resources and extensive datasets, encapsulating critical intellectual property (IP). However, the generated content is prone to malicious exploitation, including spamming and plagiarism. To address the challenges, REMARK-LLM proposes three new components: (i) a learning-based message encoding module to infuse binary signatures into LLM-generated texts; (ii) a reparameterization module to transform the dense distributions from the message encoding to the sparse distribution of the watermarked textual tokens; (iii) a decoding module dedicated for signature extraction; Besides, we introduce an optimized beam search algorithm to generate content with coherence and consistency. REMARK-LLM is rigorously trained to encourage the preservation of semantic integrity in watermarked content, while ensuring effective watermark retrieval. Extensive evaluations on multiple unseen datasets highlight REMARK-LLM's proficiency and transferability in inserting 2× more signature bits into the same texts when compared to prior art, all while maintaining semantic integrity. Furthermore, REMARK-LLM exhibits better resilience against a spectrum of watermark detection and removal attacks.
Formalizing and Benchmarking Prompt Injection Attacks and Defenses
Yupei Liu, The Pennsylvania State University; Yuqi Jia, Duke University; Runpeng Geng and Jinyuan Jia, The Pennsylvania State University; Neil Zhenqiang Gong, Duke University
A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.
Instruction Backdoor Attacks Against Customized LLMs
Rui Zhang and Hongwei Li, University of Electronic Science and Technology of China; Rui Wen, CISPA Helmholtz Center for Information Security; Wenbo Jiang and Yuan Zhang, University of Electronic Science and Technology of China; Michael Backes, CISPA Helmholtz Center for Information Security; Yun Shen, NetApp; Yang Zhang, CISPA Helmholtz Center for Information Security
The increasing demand for customized Large Language Models (LLMs) has led to the development of solutions like GPTs. These solutions facilitate tailored LLM creation via natural language prompts without coding. However, the trustworthiness of third-party custom versions of LLMs remains an essential concern. In this paper, we propose the first instruction backdoor attacks against applications integrated with untrusted customized LLMs (e.g., GPTs). Specifically, these attacks embed the backdoor into the custom version of LLMs by designing prompts with backdoor instructions, outputting the attacker's desired result when inputs contain the predefined triggers. Our attack includes 3 levels of attacks: word-level, syntax-level, and semantic-level, which adopt different types of triggers with progressive stealthiness. We stress that our attacks do not require fine-tuning or any modification to the backend LLMs, adhering strictly to GPTs development guidelines. We conduct extensive experiments on 6 prominent LLMs and 5 benchmark text classification datasets. The results show that our instruction backdoor attacks achieve the desired attack performance without compromising utility. Additionally, we propose two defense strategies and demonstrate their effectiveness in reducing such attacks. Our findings highlight the vulnerability and the potential risks of LLM customization such as GPTs.
Software Vulnerability Detection
FIRE: Combining Multi-Stage Filtering with Taint Analysis for Scalable Recurring Vulnerability Detection
Siyue Feng, National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Hubei Key Laboratory of Distributed System Security, Hubei Engineering Research Center on Big Data Security, Cluster and Grid Computing Lab; School of Cyber Science and Engineering, Huazhong University of Science and Technology; Yueming Wu, Nanyang Technological University; Wenjie Xue and Sikui Pan, National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Hubei Key Laboratory of Distributed System Security, Hubei Engineering Research Center on Big Data Security, Cluster and Grid Computing Lab; School of Cyber Science and Engineering, Huazhong University of Science and Technology; Deqing Zou, National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Hubei Key Laboratory of Distributed System Security, Hubei Engineering Research Center on Big Data Security, Cluster and Grid Computing Lab; School of Cyber Science and Engineering, Huazhong University of Science and Technology; Jinyinhu Laboratory; Yang Liu, Nanyang Technological University; Hai Jin, National Engineering Research Center for Big Data Technology and System, Services Computing Technology and System Lab, Hubei Key Laboratory of Distributed System Security, Hubei Engineering Research Center on Big Data Security, Cluster and Grid Computing Lab; School of Computer Science and Technology, Huazhong University of Science and Technology
With the continuous development of software open-sourcing, the reuse of open-source software has led to a significant increase in the occurrence of recurring vulnerabilities. These vulnerabilities often arise through the practice of copying and pasting existing vulnerabilities. Many methods have been proposed for detecting recurring vulnerabilities, but they often struggle to ensure both high efficiency and consideration of semantic information about vulnerabilities and patches. In this paper, we introduce FIRE, a scalable method for largescale recurring vulnerability detection. It utilizes multi-stage f iltering and differential taint paths to achieve precise clone vulnerability scanning at an extensive scale. In our evaluation across ten open-source software projects, FIRE demonstrates a precision of 90.0% in detecting 298 recurring vulnerabilities out of 385 ground truth instance. This surpasses the performance of existing advanced recurring vulnerability detection tools, detecting 31.4% more vulnerabilities than VUDDY and 47.0% more than MOVERY. When detecting vulnerabilities in large-scale software, FIRE outperforms MOVERY by saving about twice the time, enabling the scanning of recurring vulnerabilities on an ultra-large scale.
Inference of Error Specifications and Bug Detection Using Structural Similarities
Niels Dossche and Bart Coppens, Ghent University
Error-handling code is a crucial part of software to ensure stability and security. Failing to handle errors correctly can lead to security vulnerabilities such as DoS, privilege escalation, and data corruption. We propose a novel approach to automatically infer error specifications for system software without a priori domain knowledge, while still achieving a high recall and precision. The key insight behind our approach is that we can identify error-handling paths automatically based on structural similarities between error-handling code. We use the inferred error specification to detect three kinds of bugs: missing error checks, incorrect error checks, and error propagation bugs. Our technique uses a combination of path-sensitive, flow-sensitive and both intra-procedural and inter-procedural data-flow analysis to achieve high accuracy and great scalability. We implemented our technique in a tool called ESSS to demonstrate the effectiveness and efficiency of our approach on 7 well-tested, widely-used open-source software projects: OpenSSL, OpenSSH, PHP, zlib, libpng, freetype2, and libwebp. Our tool reported 827 potential bugs in total for all 7 projects combined. We manually categorised these 827 issues into 279 false positives and 541 true positives. Out of these 541 true positives, we sent bug reports and corresponding patches for 46 of them. All the patches were accepted and applied.
A Binary-level Thread Sanitizer or Why Sanitizing on the Binary Level is Hard
Joschua Schilling, CISPA Helmholtz Center for Information Security; Andreas Wendler, Friedrich-Alexander-Universität Erlangen-Nürnberg; Philipp Görz, Nils Bars, Moritz Schloegel, and Thorsten Holz, CISPA Helmholtz Center for Information Security
Dynamic software testing methods, such as fuzzing, have become a popular and effective method for detecting many types of faults in programs. While most research focuses on targets for which source code is available, much of the software used in practice is only available as closed source. Testing software without having access to source code forces a user to resort to binary-only testing methods, which are typically slower and lack support for crucial features, such as advanced bug oracles in the form of sanitizers, i.e., dynamic methods to detect faults based on undefined or suspicious behavior. Almost all existing sanitizers work by injecting instrumentation at compile time, requiring access to the target's source code. In this paper, we systematically identify the key challenges of applying sanitizers to binary-only targets. As a result of our analysis, we present the design and implementation of BINTSAN, an approach to realize the data race detector TSAN targeting binary-only Linux x86-64 targets. We systematically evaluate BINTSAN for correctness, effectiveness, and performance. We find that our approach has a runtime overhead of only 15% compared to source-based TSAN. Compared to existing binary solutions, our approach has better performance (up to 5.0× performance improvement) and precision, while preserving compatibility with the compiler-based TSAN.
ORANalyst: Systematic Testing Framework for Open RAN Implementations
Tianchang Yang, Syed Md Mukit Rashid, Ali Ranjbar, Gang Tan, and Syed Rafiul Hussain, The Pennsylvania State University
We develop ORANalyst, the first systematic testing framework tailored for analyzing the robustness and operational integrity of Open RAN (O-RAN) implementations. O-RAN systems are composed of numerous microservice-based components. ORANalyst initially gains insights into these complex component dependencies by combining efficient static analysis with dynamic tracing. Applying these insights, ORANalyst crafts test inputs that effectively navigate these dependencies and thoroughly test each target component. We evaluate ORANalyst on two O-RAN implementations, O-RAN-SC and SD-RAN, and identify 19 previously undiscovered vulnerabilities. If exploited, these vulnerabilities could lead to various denial-of-service attacks, resulting from component crashes and disruptions in communication channels.
Cryptographic Protocols I: Multi-Party Computation
Scalable Multi-Party Computation Protocols for Machine Learning in the Honest-Majority Setting
Fengrun Liu, University of Science and Technology of China & Shanghai Qi Zhi Institute; Xiang Xie, Shanghai Qi Zhi Institute & PADO Labs; Yu Yu, Shanghai Jiao Tong University & State Key Laboratory of Cryptology
In this paper, we present a novel and scalable multi-party computation (MPC) protocol tailored for privacy-preserving machine learning (PPML) with semi-honest security in the honest-majority setting. Our protocol utilizes the Damgaard-Nielsen (Crypto '07) protocol with Mersenne prime fields. By leveraging the special properties of Mersenne primes, we are able to design highly efficient protocols for securely computing operations such as truncation and comparison. Additionally, we extend the two-layer multiplication protocol in ATLAS (Crypto '21) to further reduce the round complexity of operations commonly used in neural networks.
Our protocol is very scalable in terms of the number of parties involved. For instance, our protocol completes the online oblivious inference of a 4-layer convolutional neural network with 63 parties in 0.1 seconds and 4.6 seconds in the LAN and WAN settings, respectively. To the best of our knowledge, this is the first fully implemented protocol in the field of PPML that can successfully run with such a large number of parties. Notably, even in the three-party case, the online phase of our protocol is more than 1.4x faster than the Falcon (PETS '21) protocol.
Lightweight Authentication of Web Data via Garble-Then-Prove
Xiang Xie, PADO Labs; Kang Yang, State Key Laboratory of Cryptology; Xiao Wang, Northwestern University; Yu Yu, Shanghai Jiao Tong University and Shanghai Qi Zhi Institute
Transport Layer Security (TLS) establishes an authenticated and confidential channel to deliver data for almost all Internet applications. A recent work (Zhang et al., CCS'20) proposed a protocol to prove the TLS payload to a third party, without any modification of TLS servers, while ensuring the privacy and originality of the data in the presence of malicious adversaries. However, it required maliciously secure Two-Party Computation (2PC) for generic circuits, leading to significant computational and communication overhead.
This paper proposes the garble-then-prove technique to achieve the same security requirement without using any heavy mechanism like generic malicious 2PC. Our end-to-end implementation shows 14x improvement in communication and an order of magnitude improvement in computation over the state-of-the-art protocol. We also show worldwide performance when using our protocol to authenticate payload data from Coinbase and Twitter APIs. Finally, we propose an efficient gadget to privately convert the above authenticated TLS payload to additively homomorphic commitments so that the properties of the payload can be proven efficiently using zkSNARKs.
Holding Secrets Accountable: Auditing Privacy-Preserving Machine Learning
Hidde Lycklama, ETH Zurich; Alexander Viand, Intel Labs; Nicolas Küchler, ETH Zurich; Christian Knabenhans, EPFL; Anwar Hithnawi, ETH Zurich
Recent advancements in privacy-preserving machine learning are paving the way to extend the benefits of ML to highly sensitive data that, until now, has been hard to utilize due to privacy concerns and regulatory constraints. Simultaneously, there is a growing emphasis on enhancing the transparency and accountability of ML, including the ability to audit deployments for aspects such as fairness, accuracy and compliance. Although ML auditing and privacy-preserving machine learning have been extensively researched, they have largely been studied in isolation. However, the integration of these two areas is becoming increasingly important. In this work, we introduce Arc, an MPC framework designed for auditing privacy-preserving machine learning. Arc cryptographically ties together the training, inference, and auditing phases to allow robust and private auditing. At the core of our framework is a new protocol for efficiently verifying inputs against succinct commitments. We evaluate the performance of our framework when instantiated with our consistency protocol and compare it to hashing-based and homomorphic-commitment-based approaches, demonstrating that it is up to 10^4× faster and up to 10^6× more concise.
Secure Account Recovery for a Privacy-Preserving Web Service
Ryan Little, Boston University; Lucy Qin, Georgetown University; Mayank Varia, Boston University
If a web service is so secure that it does not even know—and does not want to know—the identity and contact info of its users, can it still offer account recovery if a user forgets their password? This paper is the culmination of the authors' work to design a cryptographic protocol for account recovery for use by a prominent secure matching system: a web-based service that allows survivors of sexual misconduct to become aware of other survivors harmed by the same perpetrator. In such a system, the list of account-holders must be safeguarded, even against the service provider itself.
In this work, we design an account recovery system that, on the surface, appears to follow the typical workflow: the user types in their email address, receives an email containing a one-time link, and answers some security questions. Behind the scenes, the defining feature of our recovery system is that the service provider can perform email-based account validation without knowing, or being able to learn, a list of users' email addresses. Our construction uses standardized cryptography for most components, and it has been deployed in production at the secure matching system.
As a building block toward our main construction, we design a new cryptographic primitive that may be of independent interest: an oblivious pseudorandom function that can either have a fully-private input or a partially-public input, and that reaches the same output either way. This primitive allows us to perform online rate limiting for account recovery attempts, without imposing a bound on the creation of new accounts. We provide an open-source implementation of this primitive and provide evaluation results showing that the end-to-end interaction time takes 8.4-60.4 ms in fully-private input mode and 3.1-41.2 ms in partially-public input mode.
6:00 pm–7:30 pm
Symposium Reception
Franklin Hall
Thursday, August 15
8:00 am–9:00 am
Continental Breakfast
Grand Ballroom Foyer
9:00 am–10:15 am
User Studies II: At-Risk Users
Navigating Traumatic Stress Reactions During Computer Security Interventions
Lana Ramjit, Cornell Tech; Natalie Dolci, UW-Safe Campus; Francesca Rossi, Thriving Through; Ryan Garcia, UW-Safe Campus; Thomas Ristenpart, Cornell Tech; Dana Cuomo, Lafayette College
At-risk populations need direct support from computer security and privacy consultants, what we refer to as a security intervention. However, at-risk populations often face security threats while experiencing traumatic events and ensuing traumatic stress reactions. While existing security interventions follow broad principles for trauma-informed care, no prior work has studied the domain-specific effects of trauma on intervention efficacy, nor how to improve the ability of tech abuse specialists to navigate them.
We perform a multi-part study into traumatic stress in the context of digital security interventions. We first interview technology consultants from three computer security clinics that help intimate partner violence survivors with technology abuse. We identify four challenges reported by consultants emanating out of traumatic stress, some of which appear to be unique to the digital security context. To better understand these challenges, we analyze transcripts of sessions at one of the clinics, extracting five patterns of how stress reactions affect consultations. We use our findings to develop new recommended best practices, including a new intervention protocol design to help guide security interventions.
Exploring digital security and privacy in relative poverty in Germany through qualitative interviews
Anastassija Kostan and Sara Olschar, Paderborn University; Lucy Simko, The George Washington University; Yasemin Acar, Paderborn University & The George Washington University
When developing security and privacy policy, technical solutions, and research for end users, assumptions about end users' financial means and technology use situations often fail to take users' income status into account. This means that the status quo may marginalize those affected by poverty in security and privacy, and exacerbate inequalities. To enable more equitable security and privacy for all, it is crucial to understand the overall situation of low income users, their security and privacy concerns, perceptions, behaviors, and challenges. In this paper, we report on a semi-structured, in-depth interview study with low income users living in Germany (n=28) which we understand as a case study for the growing number of low income users in global north countries. We find that low income end users may be literate regarding technology use and possess solid basic knowledge about security and privacy, and generally show awareness of security and privacy threats and risks. Despite these resources, we also find that low income users are driven to poor security and privacy practices like using an untrusted cloud due to little storage space, and relying on old, broken, or used hardware. Additionally we find the mindset of a—potentially false—sense of security and privacy because through attacking them, there is "not much to get". Based on our findings, we discuss how the security and privacy community can expand comprehension about diverse end users, increase awareness and design for the specific situation of low income users, and should take more vulnerable groups into account.
"But they have overlooked a few things in Afghanistan:" An Analysis of the Integration of Biometric Voter Verification in the 2019 Afghan Presidential Elections
Kabir Panahi and Shawn Robertson, University of Kansas; Yasemin Acar, Paderborn University; Alexandru G. Bardas, University of Kansas; Tadayoshi Kohno, University of Washington; Lucy Simko, The George Washington University
Afghanistan deployed biometric voter verification (BVV) machines nationally for the first time in the critical 2019 presidential election. Through the leading authors' unique backgrounds and involvement in this election, which facilitated interviews with 18 Afghan nationals and international participants who had an active role in this Afghan election, we explore the gap between the expected outcomes of the electoral system, centered around BVVs, and the reality on election day and beyond. We find that BVVs supported and violated the electoral goals of voter enfranchisement, fraud prevention, enabling public trust, and created threats for voters, staff, and officials. We identify technical, usability, and bureaucratic underlying causes for these mismatches and discuss several vital factors that are part of an election.
Understanding How to Inform Blind and Low-Vision Users about Data Privacy through Privacy Question Answering Assistants
Yuanyuan Feng, University of Vermont; Abhilasha Ravichander, Allen Institute for Artificial Intelligence; Yaxing Yao, Virginia Tech; Shikun Zhang and Rex Chen, Carnegie Mellon University; Shomir Wilson, Pennsylvania State University; Norman Sadeh, Carnegie Mellon University
Understanding and managing data privacy in the digital world can be challenging for sighted users, let alone blind and low-vision (BLV) users. There is limited research on how BLV users, who have special accessibility needs, navigate data privacy, and how potential privacy tools could assist them. We conducted an in-depth qualitative study with 21 US BLV participants to understand their data privacy risk perception and mitigation, as well as their information behaviors related to data privacy. We also explored BLV users' attitudes towards potential privacy question answering (Q&A) assistants that enable them to better navigate data privacy information. We found that BLV users face heightened security and privacy risks, but their risk mitigation is often insufficient. They do not necessarily seek data privacy information but clearly recognize the benefits of a potential privacy Q&A assistant. They also expect privacy Q&A assistants to possess cross-platform compatibility, support multi-modality, and demonstrate robust functionality. Our study sheds light on BLV users' expectations when it comes to usability, accessibility, trust and equity issues regarding digital data privacy.
Assessing Suspicious Emails with Banner Warnings Among Blind and Low-Vision Users in Realistic Settings
Filipo Sharevski, DePaul University; Aziz Zeidieh, University of Illinois at Urbana-Champaign
Warning users about suspicious emails usually happens through visual interventions such as banners. Evidence from laboratory experiments shows that email banner warnings are unsuitable for blind and low-vision (BLV) users as they tend to miss or make no use of them. However, the laboratory settings preclude a full understanding of how BLV users would realistically behave around these banner warnings because the experiments don't use the individuals' own email addresses, devices, or emails of their choice. To address this limitation, we devised a study with n=21 BLV email users in realistic settings. Our findings indicate that this user population misses or makes no use of Gmail and Outlook banner warnings because these are implemented in a "narrow" sense, that is, (i) they allow access to the warning text without providing context relevant to the risk of associated email, and (ii) the formatting, together with the possible actions, is confusing as to how a user should deal with the email in question. To address these barriers, our participants proposed designs to accommodate the accessibility preferences and usability habits of individuals with visual disabilities according to their capabilities to engage with email banner warnings.
Side Channel III
Invalidate+Compare: A Timer-Free GPU Cache Attack Primitive
Zhenkai Zhang, Clemson University; Kunbei Cai, University of Central Florida; Yanan Guo, University of Rochester; Fan Yao, University of Central Florida; Xing Gao, University of Delaware
While extensive research has been conducted on CPU cache side-channel attacks, the landscape of similar studies on modern GPUs remains largely uncharted. In this paper, we investigate potential information leakage threats posed by the caches in GPUs of NVIDIA's latest Ampere and Ada Lovelace generations. We first exploit a GPU cache maintenance instruction to reverse engineer certain key properties of the cache hierarchy in these GPUs, and then we introduce a novel GPU cache side-channel attack primitive named Invalidate+Compare that is designed to spy on the GPU cache activities of a victim in a timer-free manner. We further showcase the use of this primitive with two case studies. The first one is a website fingerprinting attack that can accurately identify the web pages visited by a user, while the second one uncovers keystroke data entered via a virtual keyboard. To our knowledge, these stand as the first demonstrations of timer-free cache side-channel attacks on GPUs.
Peep With A Mirror: Breaking The Integrity of Android App Sandboxing via Unprivileged Cache Side Channel
Yan Lin, Jinan University; Joshua Wong, Singapore Management University; Xiang Li and Haoyu Ma, Zhejiang Lab; Debin Gao, Singapore Management University
Application sandboxing is a well-established security principle employed in the Android platform to safeguard sensitive information. However, hardware resources, specifically the CPU caches, are beyond the protection of this software-based mechanism, leaving room for potential side-channel attacks. Existing attacks against this particular weakness of app sandboxing mainly target shared components among apps, hence can only observe system-level program dynamics (such as UI tracing). In this work, we advance cache side-channel attacks by demonstrating the viability of non-intrusive and fine-grained probing across different app sandboxes, which have the potential to uncover app-specific and private program behaviors, thereby highlighting the importance of further research in this area.
In contrast to conventional attack schemes, our proposal leverages a user-level attack surface within the Android platform, namely the dynamic inter-app component sharing with package context (also known as DICI), to fully map the code of targeted victim apps into the memory space of the attacker's sandbox. Building upon this concept, we have developed a proof-of-concept attack demo called ANDROSCOPE and demonstrated its effectiveness with empirical evaluations where the attack app was shown to be able to successfully infer private information pertaining to individual apps, such as driving routes and keystroke dynamics with considerable accuracy.
Indirector: High-Precision Branch Target Injection Attacks Exploiting the Indirect Branch Predictor
Luyi Li, Hosein Yavarzadeh, and Dean Tullsen, UC San Diego
Distinguished Paper Award Winner
This paper introduces novel high-precision Branch Target Injection (BTI) attacks, leveraging the intricate structures of the Indirect Branch Predictor (IBP) and the Branch Target Buffer (BTB) in high-end Intel CPUs. It presents, for the first time, a comprehensive picture of the IBP and the BTB within the most recent Intel processors, revealing their size, structure, and the precise functions governing index and tag hashing. Additionally, this study reveals new details into the inner workings of Intel's hardware defenses, such as IBPB, IBRS, and STIBP, including previously unknown holes in their coverage. Leveraging insights from reverse engineering efforts, this research develops highly precise Branch Target Injection (BTI) attacks to breach security boundaries across diverse scenarios, including cross-process and cross-privilege scenarios and uses the IBP and the BTB to break Address Space Layout Randomization (ASLR).
Intellectual Property Exposure: Subverting and Securing Intellectual Property Encapsulation in Texas Instruments Microcontrollers
Marton Bognar, Cas Magnus, Frank Piessens, and Jo Van Bulck, DistriNet, KU Leuven
In contrast to high-end computing platforms, specialized memory protection features in low-end embedded devices remain relatively unexplored despite the ubiquity of these devices. Hence, we perform an in-depth security evaluation of the state-of-the-art Intellectual Property Encapsulation (IPE) technology found in widely used off-the-shelf, Texas Instruments MSP430 microcontrollers. While we find IPE to be promising, bearing remarkable similarities with trusted execution environments (TEEs) from research and industry, we reveal several fundamental protection shortcomings in current IPE hardware. We show that many software-level attack techniques from the academic TEE literature apply to this platform, and we discover a novel attack primitive, dubbed controlled call
corruption, exploiting a vulnerability in the IPE access control mechanism. Our practical, end-to-end attack scenarios demonstrate a complete bypass of confidentiality and integrity guarantees of IPE-protected programs.
Informed by our systematic attack study on IPE and root-cause analysis, also considering related research prototypes, we propose lightweight hardware changes to secure IPE. Furthermore, we develop a prototype framework that transparently implements software responsibilities to reduce information leakage and repurposes the onboard memory protection unit to reinstate IPE security guarantees on currently vulnerable devices with low performance overheads.
ML III: Secure ML
AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE
Wei Ao and Vishnu Naresh Boddeti, Michigan State University
Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by 1.32x to 1.8x compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by 103x and 3.46%, respectively, compared to CNNs under TFHE.
Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions
Abdulrahman Diaa, Lucas Fenaux, Thomas Humphries, Marian Dietz, Faezeh Ebrahimianghazani, Bailey Kacsmar, Xinda Li, Nils Lukas, Rasoul Akhavan Mahdavi, and Simon Oya, University of Waterloo; Ehsan Amjadian, University of Waterloo and Royal Bank of Canada; Florian Kerschbaum, University of Waterloo
Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer from prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-theart inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between 3 and 110× speedups in inference time on large models with up to 23 million parameters while maintaining competitive inference accuracy.
OblivGNN: Oblivious Inference on Transductive and Inductive Graph Neural Network
Zhibo Xu, Monash University and CSIRO's Data61; Shangqi Lai, CSIRO's Data61; Xiaoning Liu, RMIT University; Alsharif Abuadbba, CSIRO's Data61; Xingliang Yuan, The University of Melbourne; Xun Yi, RMIT University
Graph Neural Networks (GNNs) have emerged as a powerful tool for analysing graph-structured data across various domains, including social networks, banking, and bioinformatics. In the meantime, graph data contains sensitive information, such as social relations, financial transactions, and chemical structures, and GNN models are IPs of the model owner. Thus, deploying GNNs in cloud-based Machine Learning as a Service (MLaaS) raises significant privacy concerns.
In this paper, we present a comprehensive solution to enable secure GNN inference in MLaaS, named OblivGNN. OblivGNN is designed to support both transductive (static graph) and inductive (dynamic graph) inference services without revealing either graph data or GNN models. In particular, we adopt a lightweight cryptographic primitive, i.e., function secret sharing, to achieve low communication and computation overhead during inference. Furthermore, we are the first to propose a secure update protocol for the inductive setting, which can obliviously update the graph without revealing which parts of the graph are updated. Particularly, our results with three widely-used graph datasets (Cora, Citeseer, and Pubmed) show that OblivGNN can achieve comparable accuracy to an Additive Secret Sharing-based baseline. Nonetheless, our design reduces the runtime cost by up to 38% and the communication cost by 10x to 151x, highlighting its practicality when processing large graphs with GNN models.
MD-ML: Super Fast Privacy-Preserving Machine Learning for Malicious Security with a Dishonest Majority
Boshi Yuan, Shixuan Yang, and Yongxiang Zhang, Shanghai Jiao Tong University, China; Ning Ding, Dawu Gu, and Shi-Feng Sun, Shanghai Jiao Tong University, China; Shanghai Jiao Tong University (Wuxi) Blockchain Advanced Research Center
Privacy-preserving machine learning (PPML) enables the training and inference of models on private data, addressing security concerns in machine learning. PPML based on secure multi-party computation (MPC) has garnered significant attention from both the academic and industrial communities. Nevertheless, only a few PPML works provide malicious security with a dishonest majority. The state of the art by Damgård et al. (SP'19) fails to meet the demand for large models in practice, due to insufficient efficiency. In this work, we propose MD-ML, a framework for Maliciously secure Dishonest majority PPML, with a focus on boosting online efficiency.
MD-ML works for n parties, tolerating corruption of up to n-1 parties. We construct our novel protocols for PPML, including truncation, dot product, matrix multiplication, and comparison. The online communication of our dot product protocol is one single element per party, independent of input length. In addition, the online cost of our multiply-then-truncate protocol is identical to multiplication, which means truncation incurs no additional online cost. These features are achieved for the first time in the literature concerning maliciously secure dishonest majority PPML.
Benchmarking of MD-ML is conducted for SVM and NN including LeNet, AlexNet, and ResNet-18. For NN inference, compared to the state of the art (Damgård et al., SP'19), we are about 3.4—11.0x (LAN) and 9.7—157.7x (WAN) faster in online execution time.
Accelerating Secure Collaborative Machine Learning with Protocol-Aware RDMA
Zhenghang Ren, Mingxuan Fan, Zilong Wang, Junxue Zhang, and Chaoliang Zeng, iSING Lab@The Hong Kong University of Science and Technology; Zhicong Huang and Cheng Hong, Ant Group; Kai Chen, iSING Lab@The Hong Kong University of Science and Technology and University of Science and Technology of China
Secure Collaborative Machine Learning (SCML) suffers from high communication cost caused by secure computation protocols. While modern datacenters offer high-bandwidth and low-latency networks with Remote Direct Memory Access (RDMA) capability, existing SCML implementation remains to use TCP sockets, leading to inefficiency. We present CORA1 to implement SCML over RDMA. By using a protocol-aware design, CORA identifies the protocol used by the SCML program and sends messages directly to the remote party's protocol buffer, improving the efficiency of message exchange. CORA exploits the chance that the SCML task is determined before execution and the pattern is largely input-irrelevant, so that CORA can plan message destinations on remote hosts at compile time. CORA can be readily deployed with existing SCML frameworks such as Piranha with its socket-like interface. We evaluate CORA in SCML training tasks, and our results show that CORA can reduce communication cost by up to 11x and achieve 1.2x - 4.2x end-to-end speedup over TCP in SCML training.
Measurement II: Network
CalcuLatency: Leveraging Cross-Layer Network Latency Measurements to Detect Proxy-Enabled Abuse
Reethika Ramesh, University of Michigan; Philipp Winter, Independent; Sam Korman and Roya Ensafi, University of Michigan
Efforts from emerging technology companies aim to democratize the ad delivery ecosystem and build systems that are privacy-centric and even share ad revenue benefits with their users. Other providers offer remuneration for users on their platform for interacting with and making use of services. But these efforts may suffer from coordinated abuse efforts aiming to defraud them. Attackers can use VPNs and proxies to fabricate their geolocation and earn disproportionate rewards. Balancing proxy-enabled abuse-prevention techniques with a privacy-focused business model is a hard challenge. Can service providers use minimal connection features to infer proxy use without jeopardizing user privacy?
In this paper, we build and evaluate a solution, CalcuLatency, that incorporates various network latency measurement techniques and leverage the application-layer and network-layer differences in roundtrip-times when a user connects to the service using a proxy. We evaluate our four measurement techniques individually, and as an integrated system using a two-pronged evaluation. CalcuLatency is an easy-to-deploy, open-source solution that can serve as an inexpensive first-step to label proxies.
6Sense: Internet-Wide IPv6 Scanning and its Security Applications
Grant Williams, Mert Erdemir, Amanda Hsu, Shraddha Bhat, Abhishek Bhaskar, Frank Li, and Paul Pearce, Georgia Institute of Technology
Internet-wide scanning is a critical tool for security researchers and practitioners alike. By exhaustively exploring the entire IPv4 address space, Internet scanning has driven the development of new security protocols, found and tracked vulnerabilities, improved DDoS defenses, and illuminated global censorship. Unfortunately, the vast scale of the IPv6 address space—340 trillion trillion trillion addresses—precludes exhaustive scanning, necessitating entirely new IPv6-specific scanning methods. As IPv6 adoption continues to grow, developing IPv6 scanning methods is vital for maintaining our capability to comprehensively investigate Internet security.
We present 6SENSE, an end-to-end Internet-wide IPv6 scanning system. 6SENSE utilizes reinforcement learning coupled with an online scanner to iteratively reduce the space of possible IPv6 addresses into a tractable scannable subspace, thus discovering new IPv6 Internet hosts. 6SENSE is driven by a set of metrics we identify and define as key for evaluating the generality, diversity, and correctness of IPv6 scanning. We evaluate 6SENSE and prior generative IPv6 discovery methods across these metrics, showing that 6SENSE is able to identify tens of millions of IPv6 hosts, which compared to prior approaches, is up to 3.6x more hosts and 4x more end-site assignments, across a more diverse set of networks. From our analysis, we identify limitations in prior generative approaches that preclude their use for Internet-scale security scans. We also conduct the first Internet-wide scanning-driven security analysis of IPv6 hosts, focusing on TLS certificates unique to IPv6, surveying open ports and security-sensitive services, and identifying potential CVEs.
A Flushing Attack on the DNS Cache
Yehuda Afek and Anat Bremler-Barr, Tel-Aviv University; Shoham Danino, Reichman University; Yuval Shavitt, Tel-Aviv University
A severe vulnerability in the DNS resolver's cache is exposed here, introducing a new type of attack, termed DNS CacheFlush. This attack poses a significant threat as it can easily disrupt a resolver's ability to provide service to its clients.
DNS resolver software incorporates various mechanisms to safeguard its cache. However, we have identified a tricky path to bypass these safeguards, allowing a high-rate flood of malicious but seemingly existent domain name resolutions to thrash the benign DNS cache. The resulting attack has a high amplification factor, where with a low rate attack it produces a continuous high rate resource records insertions into the resolver cache. This prevents benign request resolutions from surviving in the DNS LRU cache long enough for subsequent requests to be resolved directly from the cache. Thus leading to repeated cache misses for most benign domains, resulting in a substantial delay in the DNS service. The attack rate amplification factor is high enough to even flush out popular benign domains that are requested at a high frequency (∼ 100/1sec). Moreover, the attack packets introduce additional processing overhead and all together the attack easily denies service from the resolver's legitimate clients.
In our experiments we observed 95.7% cache miss rate for a domain queried once per second under 8,000 qps attack on a resolver with 100MB cache. Even on a resolver with 2GB cache size we observed a drop of 88.3% in the resolver benign traffic throughput.
A result of this study is a recommendation to deny and drop any authoritative replies that contain many server names, e.g., a long referral response, or a long CNAME chain, before the resolver starts any processing of such a response.
SnailLoad: Exploiting Remote Network Latency Measurements without JavaScript
Stefan Gast, Roland Czerny, Jonas Juffinger, Fabian Rauscher, Simone Franza, and Daniel Gruss, Graz University of Technology
Inferring user activities on a computer from network traffic is a well-studied attack vector. Previous work has shown that they can infer websites visited, videos watched, and even user actions within specific applications. However, all of these attacks require a scenario where the attacker can observe the (possibly encrypted) network traffic, e.g., through a person-in-the-middle (PITM) attack or sitting in physical proximity to monitor WiFi packets.
In this paper, we present SnailLoad, a new side-channel attack where the victim loads an asset, e.g., a file or an image, from an attacker-controlled server, exploiting the victim's network latency as a side channel tied to activities on the victim system, e.g., watching videos or websites. SnailLoad requires no JavaScript, no form of code execution on the victim system, and no user interaction but only a constant exchange of network packets, e.g., a network connection in the background. SnailLoad measures the latency to the victim system and infers the network activity on the victim system from the latency variations. We demonstrate SnailLoad in a non-PITM video-fingerprinting attack, where we use a single SnailLoad trace to infer what video a victim user is watching momentarily. For our evaluation, we focused on a set of 10 YouTube videos the victim watches, and show that SnailLoad reaches classification F1 scores of up to 98%. We also evaluated SnailLoad in an open-world top 100 website fingerprinting attack, resulting in an F1 score of 62.8%. This shows that numerous prior works, based on network traffic observations in PITM attack scenarios, could potentially be lifted to non-PITM remote attack scenarios.
An Interview Study on Third-Party Cyber Threat Hunting Processes in the U.S. Department of Homeland Security
William P. Maxam III, US Coast Guard Academy; James C. Davis, Purdue University
Cybersecurity is a major challenge for large organizations. Traditional cybersecurity defense is reactive. Cybersecurity operations centers keep out adversaries and incident response teams clean up after break-ins. Recently a proactive stage has been introduced: Cyber Threat Hunting (TH) looks for potential compromises missed by other cyber defenses. TH is mandated for federal executive agencies and government contractors. As threat hunting is a new cybersecurity discipline, most TH teams operate without a defined process. The practices and challenges of TH have not yet been documented.
To address this gap, this paper describes the first interview study of threat hunt practitioners. We obtained access and interviewed 11 threat hunters associated with the U.S. government's Department of Homeland Security. Hour-long interviews were conducted. We analyzed the transcripts with process and thematic coding. We describe the diversity among their processes, show that their processes differ from the TH processes reported in the literature, and unify our subjects' descriptions into a single TH process. We enumerate common TH challenges and solutions according to the subjects. The two most common challenges were difficulty in assessing a Threat Hunter's expertise, and developing and maintaining automation. We conclude with recommendations for TH teams (improve planning, focus on automation, and apprentice new members) and highlight directions for future work (finding a TH process that balances flexibility and formalism, and identifying assessments for TH team performance).
ML IV: Privacy Inference I
A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data
Meenatchi Sundaram Muthu Selva Annamalai, University College London; Andrea Gadotti and Luc Rocher, University of Oxford
Recent advances in synthetic data generation (SDG) have been hailed as a solution to the difficult problem of sharing sensitive data while protecting privacy. SDG aims to learn statistical properties of real data in order to generate "artificial" data that are structurally and statistically similar to sensitive data. However, prior research suggests that inference attacks on synthetic data can undermine privacy, but only for specific outlier records.
In this work, we introduce a new attribute inference attack against synthetic data. The attack is based on linear reconstruction methods for aggregate statistics, which target all records in the dataset, not only outliers. We evaluate our attack on state-of-the-art SDG algorithms, including Probabilistic Graphical Models, Generative Adversarial Networks, and recent differentially private SDG mechanisms. By defining a formal privacy game, we show that our attack can be highly accurate even on arbitrary records, and that this is the result of individual information leakage (as opposed to population-level inference).
We then systematically evaluate the tradeoff between protecting privacy and preserving statistical utility. Our findings suggest that current SDG methods cannot consistently provide sufficient privacy protection against inference attacks while retaining reasonable utility. The best method evaluated, a differentially private SDG mechanism, can provide both protection against inference attacks and reasonable utility, but only in very specific settings. Lastly, we show that releasing a larger number of synthetic records can improve utility but at the cost of making attacks far more effective.
Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models
Matthieu Meeus, Imperial College London; Shubham Jain, Sense Street; Marek Rei and Yves-Alexandre de Montjoye, Imperial College London
With large language models (LLMs) poised to become embedded in our daily lives, questions are starting to be raised about the data they learned from. These questions range from potential bias or misinformation LLMs could retain from their training data to questions of copyright and fair use of human-generated text. However, while these questions emerge, developers of the recent state-of-the-art LLMs become increasingly reluctant to disclose details on their training corpus. We here introduce the task of document-level membership inference for real-world LLMs, i.e. inferring whether the LLM has seen a given document during training or not. First, we propose a procedure for the development and evaluation of document-level membership inference for LLMs by leveraging commonly used data sources for training and the model release date. We then propose a practical, black-box method to predict document-level membership and instantiate it on OpenLLaMA-7B with both books and academic papers. We show our methodology to perform very well, reaching an AUC of 0.856 for books and 0.678 for papers. We then show our approach to outperform the sentence-level membership inference attacks used in the privacy literature for the document-level membership task. We further evaluate whether smaller models might be less sensitive to document-level inference and show OpenLLaMA-3B to be approximately as sensitive as OpenLLaMA-7B to our approach. Finally, we consider two mitigation strategies and find the AUC to slowly decrease when only partial documents are considered but to remain fairly high when the model precision is reduced. Taken together, our results show that accurate document-level membership can be inferred for LLMs, increasing the transparency of technology poised to change our lives.
MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training
Jiacheng Li, Ninghui Li, and Bruno Ribeiro, Purdue University
In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy.
We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process.
Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.
Inf2Guard: An Information-Theoretic Framework for Learning Privacy-Preserving Representations against Inference Attacks
Sayedeh Leila Noorbakhsh and Binghui Zhang, Illinois Institute of Technology; Yuan Hong, University of Connecticut; Binghui Wang, Illinois Institute of Technology
Machine learning (ML) is vulnerable to inference (e.g., membership inference, property inference, and data reconstruction) attacks that aim to infer the private information of training data or dataset. Existing defenses are only designed for one specific type of attack and sacrifice significant utility or are soon broken by adaptive attacks. We address these limitations by proposing an information-theoretic defense framework, called Inf2Guard, against the three major types of inference attacks. Our framework, inspired by the success of representation learning, posits that learning shared representations not only saves time/costs but also benefits numerous downstream tasks. Generally, Inf2Guard involves two mutual information objectives, for privacy protection and utility preservation, respectively. Inf2Guard exhibits many merits: it facilitates the design of customized objectives against the specific inference attack; it provides a general defense framework which can treat certain existing defenses as special cases; and importantly, it aids in deriving theoretical results, e.g., inherent utility-privacy tradeoff and guaranteed privacy leakage. Extensive evaluations validate the effectiveness of Inf2Guard for learning privacy-preserving representations against inference attacks and demonstrate the superiority over the baselines.
Property Existence Inference against Generative Models
Lijin Wang, Jingjing Wang, Jie Wan, and Lin Long, Zhejiang University; Ziqi Yang and Zhan Qin, Zhejiang University, ZJU-Hangzhou Global Scientific and Technological Innovation Center
Generative models have served as the backbone of versatile tools with a wide range of applications across various fields in recent years. However, it has been demonstrated that privacy concerns, such as membership information leakage of the training dataset, exist for generative models. In this paper, we perform property existence inference against generative models as a new type of information leakage, which aims to infer whether any samples with a given property are contained in the training set. For example, to infer if any images (i.e., samples) of a specific brand of cars (i.e., property) are used to train the target model. We focus on the leakage of existence information of properties with very low proportions in the training set, which has been overlooked in previous works. We leverage the feature-level consistency of the generated data with the training data to launch inferences and validate the property existence information leakage across diverse architectures of generative models. We have examined various factors influencing the property existence inference and investigated how generated samples leak property existence information. In our conclusion, most generative models are vulnerable to property existence inferences. Additionally, we have validated our attack in Stable Diffusion which is a large-scale open-source generative model in real-world scenarios, and demonstrated its risk of property existence information leakage. The source code is available at https://github.com/wljLlla/PEI_Code.
Fuzzing II: Method
SDFuzz: Target States Driven Directed Fuzzing
Penghui Li, The Chinese University of Hong Kong and Zhongguancun Laboratory; Wei Meng, The Chinese University of Hong Kong; Chao Zhang, Tsinghua University and Zhongguancun Laboratory
Directed fuzzers often unnecessarily explore program code and paths that cannot trigger the target vulnerabilities. We observe that the major application scenarios of directed fuzzing provide detailed vulnerability descriptions, from which highly-valuable program states (i.e., target states) can be derived, e.g., call traces when a vulnerability gets triggered. By driving to expose such target states, directed fuzzers can exclude massive unnecessary exploration.
Inspired by the observation, we present SDFuzz, an efficient directed fuzzing tool driven by target states. SDFuzz first automatically extracts target states in vulnerability reports and static analysis results. SDFuzz employs a selective instrumentation technique to reduce the fuzzing scope to the required code for reaching target states. SDFuzz then early terminates the execution of a test case once SDFuzz probes that the remaining execution cannot reach the target states. It further uses a new target state feedback and refines prior imprecise distance metric into a two-dimensional feedback mechanism to proactively drive the exploration towards the target states.
We thoroughly evaluated SDFuzz on known vulnerabilities and compared it to related works. The results show that SDFuzz could improve vulnerability exposure capability with more vulnerability triggered and less time used, outperforming the state-of-the-art solutions. SDFuzz could significantly improve the fuzzing throughput. Our application of SDFuzz to automatically validate the static analysis results successfully discovered four new vulnerabilities in well-tested applications. Three of them have been acknowledged by developers.
Critical Code Guided Directed Greybox Fuzzing for Commits
Yi Xiang, Zhejiang University NGICS Platform; Xuhong Zhang, Zhejiang University and Jianghuai Advance Technology Center; Peiyu Liu, Zhejiang University NGICS Platform; Shouling Ji, Xiao Xiao, Hong Liang, and Jiacheng Xu, Zhejiang University; Wenhai Wang, Zhejiang University NGICS Platform
Newly submitted commits are prone to introducing vulnerabilities into programs. As a promising countermeasure, directed greybox fuzzers can be employed to test commit changes by designating the commit change sites as targets. However, existing directed fuzzers primarily focus on reaching a single target and neglect the diverse exploration of the additional affected code. As a result, they may overlook bugs that crash at a distant site from the change site and lack directness in multi-target scenarios, which are both very common in the context of commit testing.
In this paper, we propose WAFLGO, a direct greybox fuzzer, to effectively discover vulnerabilities introduced by commits. WAFLGO employs a novel critical code guided input generation strategy to thoroughly explore the affected code. Specifically, we identify two types of critical code: pathprefix code and data-suffix code. The critical code first guides the input generation to gradually and incrementally reach the change sites. Then while maintaining the reachability of the critical code, the input generation strategy further encourages the diversity of the generated inputs in exploring the affected code. Additionally, WAFLGO introduces a lightweight multitarget distance metric for directness and thorough examination of all change sites. We implement WAFLGO and evaluate it with 30 real-world bugs introduced by commits. Compared to eight state-of-the-art tools, WAFLGO achieves an average speedup of 10.3×. Furthermore, WAFLGO discovers seven new vulnerabilities including four CVEs while testing the most recent 50 commits of real-world software, including libtiff, fig2dev, and libming, etc.
Toward Unbiased Multiple-Target Fuzzing with Path Diversity
Huanyao Rong, Indiana University Bloomington; Wei You, Renmin University of China; XiaoFeng Wang and Tianhao Mao, Indiana University Bloomington
Directed fuzzing is an advanced software testing approach that systematically guides the fuzzing campaign toward user-defined target sites, enabling efficient discovery of vulnerabilities related to these sites. However, we have observed that some complex vulnerabilities remain undetected by directed fuzzers even when the flawed target sites are frequently tested by the generated test cases, because triggering these bugs often requires the execution of additional code in related program locations. Furthermore, when fuzzing multiple targets, the existing energy assignment in directed fuzzing lacks precision and does not ensure the fairness across targets, which leads to insufficient fuzzing effort spent on some deeper targets.
In this paper, we propose a novel directed fuzzing solution named AFLRUN, which features target path-diversity metric and unbiased energy assignment. Firstly, we develop a new coverage metric by maintaining extra virgin map for each covered target to track the coverage status of seeds that hit the target. This approach enables the storage of waypoints that hit a target through interesting path into the corpus, thus enriching the path diversity for each target. Additionally, we propose a corpus-level energy assignment strategy that ensures fairness for each target. AFLRUN starts with uniform target weight and propagates this weight to seeds to get a desired seed weight distribution. By assigning energy to each seed in the corpus according to such desired distribution, a precise and unbiased energy assignment can be achieved.
We built a prototype system and assessed its performance using a standard benchmark and several extensively fuzzed real-world applications. The evaluation results demonstrate that AFLRUN outperforms state-of-the-art fuzzers in terms of vulnerability detection, both in quantity and speed. Moreover, AFLRUN uncovers 29 previously unidentified vulnerabilities, including 8 CVEs, across four distinct programs.
SymBisect: Accurate Bisection for Fuzzer-Exposed Vulnerabilities
Zheng Zhang and Yu Hao, UC Riverside; Weiteng Chen, Microsoft Research; Xiaochen Zou, Xingyu Li, Haonan Li, Yizhuo Zhai, and Zhiyun Qian, UC Riverside; Billy Lau, Google
The popularity of fuzzing has led to its tight integration into the software development process as a routine part of the build and test, i.e., continuous fuzzing. This has resulted in a substantial increase in the reporting of bugs in open-source software, including the Linux kernel. To keep up with the volume of bugs, it is crucial to automatically analyze the bugs to assist developers and maintainers. Bug bisection, i.e., locating the commit that introduced a vulnerability, is one such analysis that can reveal the range of affected software versions and help bug prioritization and patching. However, existing automated solutions fall short in a number of ways: most of them either (1) directly run the same PoC on older software versions without adapting to changes in bug-triggering conditions and are prone to broken dynamic environments or (2) require patches that may not be available when the bug is discovered. In this work, we take a different approach to looking for evidence of fuzzer-exposed vulnerabilities by looking for the underlying bug logic. In this way, we can perform bug bisection much more precisely and accurately. Specifically, we apply underconstrained symbolic execution with several principled guiding techniques to search for the presence of the bug logic efficiently. We show that our approach achieves significantly better accuracy than the state-of-the-art solution by 16% (from 74.7% to 90.7%).
Data Coverage for Guided Fuzzing
Mingzhe Wang, Jie Liang, Chijin Zhou, Zhiyong Wu, Jingzhou Fu, and Zhuo Su, Tsinghua University; Qing Liao, Harbin Institute of Technology; Bin Gu, Beijing Institute of Control Engineering; Bodong Wu, Huawei Technologies Co., Ltd; Yu Jiang, Tsinghua University
Distinguished Paper Award Winner
Code coverage is crucial for fuzzing. It helps fuzzers identify areas of a program that have not been explored, which are often the most likely to contain bugs. However, code coverage only reflects a small part of a program's structure. Many crucial program constructs, such as constraints, automata, and Turing-complete domain-specific languages, are embedded in a program as constant data. Since this data cannot be effectively reflected by code coverage, it remains a major challenge for modern fuzzing practices.
To address this challenge, we propose data coverage for guided fuzzing. The idea is to detect novel constant data references and maximize their coverage. However, the widespread use of constant data can significantly impact fuzzing throughput if not handled carefully. To overcome this issue, we optimize for real-world fuzzing practices by classifying data access according to semantics and designing customized collection strategies. We also develop novel storage and utilization techniques for improved fuzzing efficiency. Finally, we enhance libFuzzer with data coverage and submit it to Google's FuzzBench for evaluation. Our approach outperforms many state-of-the-art fuzzers and achieves the best coverage score in the experiment. Furthermore, we have discovered 28 previously-unknown bugs on OSS-Fuzz projects that were well-fuzzed using code coverage.
Crypto II: Searchable Encryption
I/O-Efficient Dynamic Searchable Encryption meets Forward & Backward Privacy
Priyanka Mondal, University of California, Santa Cruz; Javad Ghareh Chamani, HKUST; Ioannis Demertzis, University of California, Santa Cruz; Dimitrios Papadopoulos, HKUST
We focus on the problem of I/O-efficient Dynamic Searchable Encryption (DSE), i.e., schemes that perform well when executed with the dataset on-disk. Towards this direction, for HDDs, schemes have been proposed with good locality (i.e., low number of performed non-continuous memory reads) and read efficiency (the number of additional memory locations read per result item). Similarly, for SSDs, schemes with good page efficiency (reading as few pages as possible) have been proposed. However, the vast majority of these works are limited to the static case (i.e. no dataset modifications) and the only dynamic scheme fails to achieve forward and backward privacy, the de-facto leakage standard in the literature. In fact, prior related works (Bost [CCS'16] and Minaud and Reichle[CRYPTO'22]) claim that I/O-efficiency and forward-privacy are two irreconcilable notions. Contrary to that, in this work, we "reconcile" for the first time forward and backward privacy with I/O-efficiency for DSE both for HDDs and SSDs. We propose two families of DSE constructions which also improve the state-of-the-art (non I/O-efficient) both asymptotically and experimentally. Indeed, some of our schemes improve the in-memory performance of prior works. At a technical level, we revisit and enhance the lazy de-amortization DSE construction by Demertzis et al. [NDSS'20], transforming it into an I/O-preserving one. Importantly, we introduce an oblivious-merge protocol that merges two equal-sized databases without revealing any information, effectively replacing the costly oblivious data structures with more lightweight computations.
FEASE: Fast and Expressive Asymmetric Searchable Encryption
Long Meng, Liqun Chen, and Yangguang Tian, University of Surrey; Mark Manulis, Universität der Bundeswehr München; Suhui Liu, Southeast University
Asymmetric Searchable Encryption (ASE) is a promising cryptographic mechanism that enables a semi-trusted cloud server to perform keyword searches over encrypted data for users. To be useful, an ASE scheme must support expressive search queries, which are expressed as conjunction, disjunction, or any Boolean formulas. In this paper, we propose a fast and expressive ASE scheme that is adaptively secure, called FEASE. It requires only 3 pairing operations for searching any conjunctive set of keywords independent of the set size and has linear complexity for encryption and trapdoor algorithms in the number of keywords. FEASE is based on a new fast Anonymous Key-Policy Attribute-Based Encryption (A-KP-ABE) scheme as our first proposal, which is of independent interest. To address optional protection against keyword guessing attacks, we extend FEASE into the first expressive Public-Key Authenticated Encryption with Keyword Search (PAEKS) scheme. We provide implementations and evaluate the performance of all three schemes, while also comparing them with the state of the art. We observe that FEASE outperforms all existing expressive ASE constructions and that our A-KP-ABE scheme offers anonymity with efficiency comparable to the currently fastest yet non-anonymous KP-ABE schemes FAME (ACM CCS 2017) and FABEO (ACM CCS 2022).
d-DSE: Distinct Dynamic Searchable Encryption Resisting Volume Leakage in Encrypted Databases
Dongli Liu and Wei Wang, Huazhong University of Science and Technology; Peng Xu, Huazhong University of Science and Technology, Hubei Key Laboratory of Distributed System Security, School of Cyber Science and Engineering, JinYinHu Laboratory, and State Key Laboratory of Cryptology; Laurence T. Yang, Huazhong University of Science and Technology and St. Francis Xavier University; Bo Luo, The University of Kansas; Kaitai Liang, Delft University of Technology
Dynamic Searchable Encryption (DSE) has emerged as a solution to efficiently handle and protect large-scale data storage in encrypted databases (EDBs). Volume leakage poses a significant threat, as it enables adversaries to reconstruct search queries and potentially compromise the security and privacy of data. Padding strategies are common countermeasures for the leakage, but they significantly increase storage and communication costs. In this work, we develop a new perspective on handling volume leakage. We start with distinct search and further explore a new concept called distinct DSE (d-DSE).
We also define new security notions, in particular Distinct with Volume-Hiding security, as well as forward and backward privacy, for the new concept. Based on d-DSE, we construct the d-DSE designed EDB with related constructions for distinct keyword (d-KW-dDSE), keyword (KW-dDSE), and join queries (JOIN-dDSE) and update queries in encrypted databases. We instantiate a concrete scheme BF-SRE, employing Symmetric Revocable Encryption. We conduct extensive experiments on real-world datasets, such as Crime, Wikipedia, and Enron, for performance evaluation. The results demonstrate that our scheme is practical in data search and with comparable computational performance to the SOTA DSE scheme (MITRA*, AURA) and padding strategies (SEAL, ShieldDB). Furthermore, our proposal sharply reduces the communication cost as compared to padding strategies, with roughly 6.36 to 53.14x advantage for search queries.
MUSES: Efficient Multi-User Searchable Encrypted Database
Tung Le, Virginia Tech; Rouzbeh Behnia, University of South Florida; Jorge Guajardo, Robert Bosch Research and Technology Center; Thang Hoang, Virginia Tech
Searchable encrypted systems enable privacy-preserving keyword search on encrypted data. Symmetric systems achieve high efficiency (e.g., sublinear search), but they mostly support single-user search. Although systems based on public-key or hybrid models support multi-user search, they incur inherent security weaknesses (e.g., keyword-guessing vulnerabilities) and scalability limitations due to costly public-key operations (e.g., pairing). More importantly, most encrypted search designs leak statistical information (e.g., search, result, and volume patterns) and thus are vulnerable to devastating leakage-abuse attacks. Some pattern-hiding schemes were proposed. However, they incur significant user bandwidth/computation costs, and thus are not desirable for large-scale outsourced databases with resource-constrained users.
In this paper, we propose MUSES, a new multi-user encrypted search platform that addresses the functionality, security, and performance limitations in the existing encrypted search designs. Specifically, MUSES permits multi-user functionalities (reader/writer separation, permission revocation) and hides all statistical information (including search, result, and volume patterns) while featuring minimal user overhead. In MUSES, we demonstrate a unique incorporation of various emerging distributed cryptographic protocols including Distributed Point Function, Distributed PRF, and Oblivious Linear Group Action. We also introduce novel distributed protocols for oblivious counting and shuffling on arithmetic shares for the general multi-party setting with a dishonest majority, which can be found useful in other applications. Our experimental results showed that the keyword search by MUSES is two orders of magnitude faster with up to 12× lower user bandwidth cost than the state-of-the-art.
Query Recovery from Easy to Hard: Jigsaw Attack against SSE
Hao Nie and Wei Wang, Huazhong University of Science and Technology; Peng Xu, Huazhong University of Science and Technology, Hubei Key Laboratory of Distributed System Security, School of Cyber Science and Engineering, JinYinHu Laboratory, and State Key Laboratory of Cryptology; Xianglong Zhang, Huazhong University of Science and Technology; Laurence T. Yang, Huazhong University of Science and Technology and St. Francis Xavier University; Kaitai Liang, Delft University of Technology
Searchable symmetric encryption schemes often unintentionally disclose certain sensitive information, such as access, volume, and search patterns. Attackers can exploit such leakages and other available knowledge related to the user's database to recover queries. We find that the effectiveness of query recovery attacks depends on the volume/frequency distribution of keywords. Queries containing keywords with high volumes/frequencies are more susceptible to recovery, even when countermeasures are implemented. Attackers can also effectively leverage these "special" queries to recover all others.
By exploiting the above finding, we propose a Jigsaw attack that begins by accurately identifying and recovering those distinctive queries. Leveraging the volume, frequency, and cooccurrence information, our attack achieves 90% accuracy in three tested datasets, which is comparable to previous attacks (Oya et al., USENIX' 22 and Damie et al., USENIX' 21). With the same runtime, our attack demonstrates an advantage over the attack proposed by Oya et al (approximately 15% more accuracy when the keyword universe size is 15k). Furthermore, our proposed attack outperforms existing attacks against widely studied countermeasures, achieving roughly 60% and 85% accuracy against the padding and the obfuscation, respectively. In this context, with a large keyword universe (≥3k), it surpasses current state-of-the-art attacks by more than 20%.
10:15 am–10:45 am
Coffee and Tea Break
Grand Ballroom Foyer
10:45 am–12:00 pm
Social Issues II: Surveillance and Censorship
GFWeb: Measuring the Great Firewall's Web Censorship at Scale
Nguyen Phong Hoang, University of British Columbia and University of Chicago; Jakub Dalek and Masashi Crete-Nishihata, Citizen Lab - University of Toronto; Nicolas Christin, Carnegie Mellon University; Vinod Yegneswaran, SRI International; Michalis Polychronakis, Stony Brook University; Nick Feamster, University of Chicago
Censorship systems such as the Great Firewall (GFW) have been continuously refined to enhance their filtering capabilities. However, most prior studies, and in particular the GFW, have been limited in scope and conducted over short time periods, leading to gaps in our understanding of the GFW's evolving Web censorship mechanisms over time. We introduce GFWeb, a novel system designed to discover domain blocklists used by the GFW for censoring Web access. GFWeb exploits GFW's bidirectional and loss-tolerant blocking behavior to enable testing hundreds of millions of domains on a monthly basis, thereby facilitating large-scale longitudinal measurement of HTTP and HTTPS blocking mechanisms.
Over the course of 20 months, GFWeb has tested a total of 1.02 billion domains, and detected 943K and 55K pay-level domains censored by the GFW's HTTP and HTTPS filters, respectively. To the best of our knowledge, our study represents the most extensive set of domains censored by the GFW ever discovered to date, many of which have never been detected by prior systems. Analyzing the longitudinal dataset collected by GFWeb, we observe that the GFW has been upgraded to mitigate several issues previously identified by the research community, including overblocking and failure in reassembling fragmented packets. More importantly, we discover that the GFW's bidirectional blocking is not symmetric as previously thought, i.e., it can only be triggered by certain domains when probed from inside the country. We discuss the implications of our work on existing censorship measurement and circumvention efforts. We hope insights gained from our study can help inform future research, especially in monitoring censorship and developing new evasion tools.
Snowflake, a censorship circumvention system using temporary WebRTC proxies
Cecylia Bocovich, Tor Project; Arlo Breault, Wikimedia Foundation; David Fifield and Serene, unaffiliated; Xiaokang Wang, Tor Project
Snowflake is a system for circumventing Internet censorship. Its blocking resistance comes from the use of numerous, ultra-light, temporary proxies ("snowflakes"), which accept traffic from censored clients using peer-to-peer WebRTC protocols and forward it to a centralized bridge. The temporary proxies are simple enough to be implemented in JavaScript, in a web page or browser extension, making them much cheaper to run than a traditional proxy or VPN server. The large and changing pool of proxy addresses resists enumeration and blocking by a censor. The system is designed with the assumption that proxies may appear or disappear at any time. Clients discover proxies dynamically using a secure rendezvous protocol. When an in-use proxy goes offline, its client switches to another on the fly, invisibly to upper network layers.
Snowflake has been deployed with success in Tor Browser and Orbot for several years. It has been a significant circumvention tool during high-profile network disruptions, including in Russia in 2021 and Iran in 2022. In this paper, we explain the composition of Snowflake's many parts, give a history of deployment and blocking attempts, and reflect on implications for circumvention generally.
SpotProxy: Rediscovering the Cloud for Censorship Circumvention
Patrick Tser Jern Kon, University of Michigan; Sina Kamali, University of Waterloo; Jinyu Pei, Rice University; Diogo Barradas, University of Waterloo; Ang Chen, University of Michigan; Micah Sherr, Georgetown University; Moti Yung, Google and Columbia University
Censorship circumvention is often fueled by supporters out of goodwill. However, hosting circumvention proxies can be costly, especially when they are placed in the cloud. We argue for re-examining cloud features and leveraging them to achieve novel circumvention benefits, even though these features are not explicitly engineered for censorship circumvention. SpotProxy is inspired by Spot VMs—cloud instances backed with excess resources, sold at a fraction of the cost of regular instances, that can be taken away at a moment's notice if higher-paying requests arrive. We observe that for circumvention proxies, Spot VMs not only translate to cost savings, but also create a high churn rate since proxies are constantly re-spawned at different IP addresses—making them more difficult for a censor to enumerate and block. SpotProxy pushes this observation to the extreme and designs a circumvention infrastructure that constantly searches for cheaper VMs and refreshes the fleet for anti-blocking, for spot and regular VMs alike. We adapt Wireguard and Snowflake for use with SpotProxy, and demonstrate that our active migration mechanism allows clients to seamlessly move between proxies without degrading their performance or disrupting existing connections. We show that SpotProxy leads to significant cost savings, and that SpotProxy's rejuvenation mechanism enables proxies to be replenished frequently with new addresses.
Bridging Barriers: A Survey of Challenges and Priorities in the Censorship Circumvention Landscape
Diwen Xue, Anna Ablove, and Reethika Ramesh, University of Michigan; Grace Kwak Danciu, Independent; Roya Ensafi, University of Michigan
The ecosystem of censorship circumvention tools (CTs) remains one of the most opaque and least understood, overshadowed by the precarious legal status around their usage and operation, and the risks facing those directly involved. Used by hundreds of millions of users across the most restricted networks, these tools circulate not through advertisements but word-of-mouth, distributed not through appstores but underground networks, and adopted not out of trust but from the sheer necessity for information access.
This paper aims to elucidate the dynamics and challenges of the CT ecosystem, and the needs and priorities of its stakeholders. We perform the first multi-perspective study, surveying 12 leading CT providers that service upwards of 100 million users, combined with experiences from CT users in Russia and China. Beyond the commonly cited technical challenges and disruptions from censors, our study also highlights funding constraints, usability issues, misconceptions, and misbehaving players, all of which similarly plague the CT ecosystem. Having the unique opportunity to survey these at-risk CT stakeholders, we outline key future priorities for those involved. We hope our work encourages further research to advance our understanding of this complex and uniquely challenged ecosystem.
Fingerprinting Obfuscated Proxy Traffic with Encapsulated TLS Handshakes
Diwen Xue, University of Michigan; Michalis Kallitsis, Merit Network, Inc.; Amir Houmansadr, UMass Amherst; Roya Ensafi, University of Michigan
The global escalation of Internet censorship by nation-state actors has led to an ongoing arms race between censors and obfuscated circumvention proxies. Research over the past decade has extensively examined various fingerprinting attacks against individual proxy protocols and their respective countermeasures. In this paper, however, we demonstrate the feasibility of a protocol-agnostic approach to proxy detection, enabled by the shared characteristic of nested protocol stacks inherent to all forms of proxying and tunneling activities. We showcase the practicality of such approach by identifying one specific fingerprint--encapsulated TLS handshakes--that results from nested protocol stacks, and building similarity-based classifiers to isolate this unique fingerprint within encrypted traffic streams.
Assuming the role of a censor, we build a detection framework and deploy it within a mid-size ISP serving upwards of one million users. Our evaluation demonstrates that the traffic of obfuscated proxies, even with random padding and multiple layers of encapsulations, can be reliably detected with minimal collateral damage by fingerprinting encapsulated TLS handshakes. While stream multiplexing shows promise as a viable countermeasure, we caution that existing obfuscations based on multiplexing and random padding alone are inherently limited, due to their inability to reduce the size of traffic bursts or the number of round trips within a connection. Proxy developers should be aware of these limitations, anticipate the potential exploitation of encapsulated TLS handshakes by the censors, and equip their tools with proactive countermeasures.
AR and VR
When the User Is Inside the User Interface: An Empirical Study of UI Security Properties in Augmented Reality
Kaiming Cheng, Arkaprabha Bhattacharya, Michelle Lin, Jaewook Lee, Aroosh Kumar, Jeffery F. Tian, Tadayoshi Kohno, and Franziska Roesner, University of Washington
Augmented reality (AR) experiences place users inside the user interface (UI), where they can see and interact with three-dimensional virtual content. This paper explores UI security for AR platforms, for which we identify three UI security-related properties: Same Space (how does the platform handle virtual content placed at the same coordinates?), Invisibility (how does the platform handle invisible virtual content?), and Synthetic Input (how does the platform handle simulated user input?). We demonstrate the security implications of different instantiations of these properties through five proof-of-concept attacks between distrusting AR application components (i.e., a main app and an included library) — including a clickjacking attack and an object erasure attack. We then empirically investigate these UI security properties on five current AR platforms: ARCore (Google), ARKit (Apple), Hololens (Microsoft), Oculus (Meta), and WebXR (browser). We find that all platforms enable at least three of our proof-of-concept attacks to succeed. We discuss potential future defenses, including applying lessons from 2D UI security and identifying new directions for AR UI security.
Can Virtual Reality Protect Users from Keystroke Inference Attacks?
Zhuolin Yang, Zain Sarwar, Iris Hwang, Ronik Bhaskar, Ben Y. Zhao, and Haitao Zheng, University of Chicago
Virtual Reality (VR) has gained popularity by providing immersive and interactive experiences without geographical limitations. It also provides a sense of personal privacy through physical separation. In this paper, we show that despite assumptions of enhanced privacy, VR is unable to shield its users from side-channel attacks that steal private information. Ironically, this vulnerability arises from VR's greatest strength, its immersive and interactive nature. We demonstrate this by designing and implementing a new set of keystroke inference attacks in shared virtual environments, where an attacker (VR user) can recover the content typed by another VR user by observing their avatar. While the avatar displays noisy telemetry of the user's hand motion, an intelligent attacker can use that data to recognize typed keys and reconstruct typed content, without knowing the keyboard layout or gathering labeled data. We evaluate the proposed attacks using IRB-approved user studies across multiple VR scenarios. For 13 out of 15 tested users, our attacks accurately recognize 86%-98% of typed keys, and the recovered content retains up to 98% of the meaning of the original typed content. We also discuss potential defenses.
Remote Keylogging Attacks in Multi-user VR Applications
Zihao Su, University of California, Santa Barbara; Kunlin Cai, University of California, Los Angeles; Reuben Beeler, Lukas Dresel, Allan Garcia, and Ilya Grishchenko, University of California, Santa Barbara; Yuan Tian, University of California, Los Angeles; Christopher Kruegel and Giovanni Vigna, University of California, Santa Barbara
As Virtual Reality (VR) applications grow in popularity, they have bridged distances and brought users closer together. However, with this growth, there have been increasing concerns about security and privacy, especially related to the motion data used to create immersive experiences. In this study, we highlight a significant security threat in multi-user VR applications, which are applications that allow multiple users to interact with each other in the same virtual space. Specifically, we propose a remote attack that utilizes the avatar rendering information collected from an adversary's game clients to extract user-typed secrets like credit card information, passwords, or private conversations. We do this by (1) extracting motion data from network packets, and (2) mapping motion data to keystroke entries. We conducted a user study to verify the attack's effectiveness, in which our attack successfully inferred 97.62% of the keystrokes. Besides, we performed an additional experiment to underline that our attack is practical, confirming its effectiveness even when (1) there are multiple users in a room, and (2) the attacker cannot see the victims. Moreover, we replicated our proposed attack on four applications to demonstrate the generalizability of the attack. These results underscore the severity of the vulnerability and its potential impact on millions of VR social platform users.
That Doesn't Go There: Attacks on Shared State in Multi-User Augmented Reality Applications
Carter Slocum, Yicheng Zhang, Erfan Shayegani, Pedram Zaree, and Nael Abu-Ghazaleh, University of California, Riverside; Jiasi Chen, University of Michigan
Augmented Reality (AR) can enable shared virtual experiences between multiple users. In order to do so, it is crucial for multi-user AR applications to establish a consensus on the "shared state" of the virtual world and its augmentations through which users interact. Current methods to create and access shared state collect sensor data from devices (e.g., camera images), process them, and integrate them into the shared state. However, this process introduces new vulnerabilities and opportunities for attacks. Maliciously writing false data to "poison" the shared state is a major concern for the security of the downstream victims that depend on it. Another type of vulnerability arises when reading the shared state: by providing false inputs, an attacker can view hologram augmentations at locations they are not allowed to access. In this work, we demonstrate a series of novel attacks on multiple AR frameworks with shared states, focusing on three publicly accessible frameworks. We show that these frameworks, while using different underlying implementations, scopes, and mechanisms to read from and write to the shared state, have shared vulnerability to a unified threat model. Our evaluations of these state-of-the-art AR frameworks demonstrate reliable attacks both on updating and accessing the shared state across different systems. To defend against such threats, we discuss a number of potential mitigation strategies that can help enhance the security of multi-user AR applications and implement an initial prototype.
Penetration Vision through Virtual Reality Headsets: Identifying 360-degree Videos from Head Movements
Anh Nguyen, Xiaokuan Zhang, and Zhisheng Yan, George Mason University
In this paper, we present the first contactless side-channel attack for identifying 360° videos being viewed in a Virtual Reality (VR) Head Mounted Display (HMD). Although the video content is displayed inside the HMD without any external exposure, we observe that user head movements are driven by the video content, which creates a unique side channel that does not exist in traditional 2D videos. By recording the user whose vision is blocked by the HMD via a malicious camera, an attacker can analyze the correlation between the user's head movements and the victim video to infer the video title.
To exploit this new vulnerability, we present INTRUDE, a system for identifying 360° videos from recordings of user head movements. INTRUDE is empowered by an HMD-based head movement estimation scheme to extract a head movement trace from the recording and a video saliency-based trace-fingerprint matching framework to infer the video title. Evaluation results show that INTRUDE achieves over 96% of accuracy for video identification and is robust under different recording environments. Moreover, INTRUDE maintains its effectiveness in the open-world identification scenario.
User Studies III: Privacy I
"I'm not convinced that they don't collect more than is necessary": User-Controlled Data Minimization Design in Search Engines
Tanusree Sharma, University of Illinois at Urbana-Champaign; Lin Kyi, Max Planck Institute for Security and Privacy; Yang Wang, University of Illinois at Urbana-Champaign; Asia J. Biega, Max Planck Institute for Security and Privacy
Data minimization is a legal and privacy-by-design principle mandating that online services collect only data that is necessary for pre-specified purposes. While the principle has thus far mostly been interpreted from a system-centered perspective, there is a lack of understanding about how data minimization could be designed from a user-centered perspective, and in particular, what factors might influence user decision-making with regard to the necessity of data for different processing purposes. To address this gap, in this paper, we gain a deeper understanding of users' design expectations and decision-making processes related to data minimization, focusing on a case study of search engines. We also elicit expert evaluations of the feasibility of user-generated design ideas. We conducted interviews with 25 end users and 10 experts from the EU and UK to provide concrete design recommendations for data minimization that incorporate user needs, concerns, and preferences. Our study (i) surfaces how users reason about the necessity of data in the context of search result quality, and (ii) examines the impact of several factors on user decision-making about data processing, including specific types of search data, or the volume and recency of data. Most participants emphasized the particular importance of data minimization in the context of sensitive searches, such as political, financial, or health-related search queries. In a think-aloud conceptual design session, participants recommended search profile customization as a solution for retaining data they considered necessary, as well as alert systems that would inform users to minimize data in instances of excessive collection. We propose actionable design features that could provide users with greater agency over their data through user-controlled data minimization, combined with relevant implementation insights from experts.
The Effect of Design Patterns on (Present and Future) Cookie Consent Decisions
Nataliia Bielova, Inria research centre at Université Côte d'Azur; Laura Litvine and Anysia Nguyen, Behavioural Insights Team (BIT); Mariam Chammat, Interministerial Directorate for Public Transformation (DITP); Vincent Toubiana, Commission Nationale de l'Informatique et des Libertés (CNIL); Estelle Hary, RMIT University
Today most websites in the EU present users with a consent banner asking about the use of cookies or other tracking technologies. Data Protection Authorities (DPAs) need to ensure that users can express their true preferences when faced with these banners, while simultaneously satisfying the EU GDPR requirements. To address the needs of the French DPA, we conducted an online experiment among 3,947 participants in France exploring the impact of six different consent banner designs on the outcome of users' consent decision. We also assessed participants' knowledge and privacy preferences, as well as satisfaction with the banners. In contrast with previous results, we found that a "bright pattern" that highlights the decline option has a substantial effect on users' decisions. We also find that two new designs based on behavioral levers have the strongest effect on the outcome of the consent decision, and participants' satisfaction with the banners. Finally, our study provides novel evidence that the effect of design persists in a short time frame: designs can significantly affect users' future choices, even when faced with neutral banners.
Unpacking Privacy Labels: A Measurement and Developer Perspective on Google's Data Safety Section
Rishabh Khandelwal, Asmit Nayak, Paul Chung, and Kassem Fawaz, University of Wisconsin-Madison
Google has mandated developers to use Data Safety Sections (DSS) to increase transparency in data collection and sharing practices. In this paper, we present a comprehensive analysis of Google's Data Safety Section (DSS) using both quantitative and qualitative methods. We conduct the first large-scale measurement study of DSS using apps from Android Play store (n=1.1M). We find that there are internal inconsistencies within the reported practices. We also find trends of both over and under-reporting practices in the DSSs. Finally, we conduct a longitudinal study of DSS to explore how the reported practices evolve over time, and find that the developers are still adjusting their practices. To contextualize these findings, we conduct a developer study, uncovering the process that app developers undergo when working with DSS. We highlight the challenges faced and strategies employed by developers for DSS submission, and the factors contributing to changes in the DSS. Our research contributes valuable insights into the complexities of implementing and maintaining privacy labels, underlining the need for better resources, tools, and guidelines to aid developers. This understanding is crucial as the accuracy and reliability of privacy labels directly impact their effectiveness.
Dissecting Privacy Perspectives of Websites Around the World: "Aceptar Todo, Alle Akzeptieren, Accept All..."
Aysun Ogut, Berke Turanlioglu, Doruk Can Metiner, Albert Levi, Cemal Yilmaz, and Orcun Cetin, Sabanci University, Tuzla, Istanbul, Turkiye; Selcuk Uluagac, Cyber-Physical Systems Security Lab, Florida International University, Miami, Florida, USA
Privacy has become a significant concern as the processing, storage, and sharing of collected data expands. In order to take precautions against this increasing issue, countries and different government entities have enacted laws for the protection of privacy, and articles regarding acquiring consent from the user to collect data (i.e., via cookies) have been regulated such as the right of one to be informed and to manage their preferences. Even though there are many regulations, still many websites do not transparently provide their users with their privacy practices and cookie consent notices, and restrict one's rights or make it difficult to set/choose their privacy preferences. The main objective of this study is to analyze whether websites from around the world inform their users about the collection of their data and to identify how easy or difficult for users to set their privacy preferences in practice. While observing the differences between countries, we also aim to examine whether there is an effect of geographical location on privacy approaches and whether the applications and interpretations of countries that follow and comply with the same laws are similar. For this purpose, we have developed an automated tool to scan the privacy notices on the 500 most popular websites in different countries around the world. Our extensive analysis indicates that in some countries users are rarely informed and even in countries with high cookie consent notifications, offering the option to refuse is still very low despite the fact that it is part of their regulations. The highest rate of reject buttons on cookie banners in the countries studied is 35%. Overall, although the law gives the user the right to refuse consent and be informed, we have concluded that this does not apply in practice in most countries. Moreover, in many cases, the implementations are convoluted and not user-friendly at all.
Data Subjects' Reactions to Exercising Their Right of Access
Arthur Borem, Elleen Pan, Olufunmilola Obielodan, Aurelie Roubinowitz, and Luca Dovichi, University of Chicago; Michelle L. Mazurek, University of Maryland; Blase Ur, University of Chicago
Recent privacy laws have strengthened data subjects' right to access personal data collected by companies. Prior work has found that data exports companies provide consumers in response to Data Subject Access Requests (DSARs) can be overwhelming and hard to understand. To identify directions for improving the user experience of data exports, we conducted an online study in which 33 participants explored their own data from Amazon, Facebook, Google, Spotify, or Uber. Participants articulated questions they hoped to answer using the exports. They also annotated parts of the export they found confusing, creepy, interesting, or surprising. While participants hoped to learn either about their own usage of the platform or how the company collects and uses their personal data, these questions were often left unanswered. Participants' annotations documented their excitement at finding data records that triggered nostalgia, but also shock and anger about the privacy implications of other data they saw. Having examining their data, many participants hoped to request the company erase some, but not all, of the data. We discuss opportunities for future transparency-enhancing tools and enhanced laws.
ML V: Backdoor Defense
Neural Network Semantic Backdoor Detection and Mitigation: A Causality-Based Approach
Bing Sun, Jun Sun, and Wayne Koh, Singapore Management University; Jie Shi, Huawei Singapore
Different from ordinary backdoors in neural networks which are introduced with artificial triggers (e.g., certain specific patch) and/or by tampering the samples, semantic backdoors are introduced by simply manipulating the semantic, e.g., by labeling green cars as frogs in the training set. By focusing on samples with rare semantic features (such as green cars), the accuracy of the model is often minimally affected. Since the attacker is not required to modify the input sample during training nor inference time, semantic backdoors are challenging to detect and remove. Existing backdoor detection and mitigation techniques are shown to be ineffective with respect to semantic backdoors. In this work, we propose a method to systematically detect and remove semantic backdoors. Specifically we propose SODA (Semantic BackdOor Detection and MitigAtion) with the key idea of conducting lightweight causality analysis to identify potential semantic backdoor based on how hidden neurons contribute to the predictions and to remove the backdoor by adjusting the responsible neurons' contribution towards the correct predictions through optimization. SODA is evaluated with 21 neural networks trained on 6 benchmark datasets and 2 kinds of semantic backdoor attacks for each dataset. The results show that it effectively detects and removes semantic backdoors and preserves the accuracy of the neural networks.
On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Changjiang Li, Stony Brook University; Ren Pang, Bochuan Cao, Zhaohan Xi, and Jinghui Chen, Pennsylvania State University; Shouling Ji, Zhejiang University; Ting Wang, Stony Brook University
Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers. However, thus far it remains under-explored how contrastive backdoor attacks fundamentally differ from their supervised counterparts, which impedes the development of effective defenses against the emerging threat.
This work represents a solid step toward answering this critical question. Specifically, we define TRL, a unified framework that encompasses both supervised and contrastive backdoor attacks. Through the lens of TRL, we uncover that the two types of attacks operate through distinctive mechanisms: in supervised attacks, the learning of benign and backdoor tasks tends to occur independently, while in contrastive attacks, the two tasks are deeply intertwined both in their representations and throughout their learning processes. This distinction leads to the disparate learning dynamics and feature distributions of supervised and contrastive attacks. More importantly, we reveal that the specificities of contrastive backdoor attacks entail important implications from a defense perspective: existing defenses for supervised attacks are often inadequate and not easily retrofitted to contrastive attacks. We also explore several promising alternative defenses and discuss their potential challenges. Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks, pointing to promising directions for future research.
Mudjacking: Patching Backdoor Vulnerabilities in Foundation Models
Hongbin Liu, Michael K. Reiter, and Neil Zhenqiang Gong, Duke University
Foundation model has become the backbone of the AI ecosystem. In particular, a foundation model can be used as a general-purpose feature extractor to build various downstream classifiers. However, foundation models are vulnerable to backdoor attacks and a backdoored foundation model is a single-point-of-failure of the AI ecosystem, e.g., multiple downstream classifiers inherit the backdoor vulnerabilities simultaneously. In this work, we propose Mudjacking, the first method to patch foundation models to remove backdoors. Specifically, given a misclassified trigger-embedded input detected after a backdoored foundation model is deployed, Mudjacking adjusts the parameters of the foundation model to remove the backdoor. We formulate patching a foundation model as an optimization problem and propose a gradient descent based method to solve it. We evaluate Mudjacking on both vision and language foundation models, eleven benchmark datasets, five existing backdoor attacks, and thirteen adaptive backdoor attacks. Our results show that Mudjacking can remove backdoor from a foundation model while maintaining its utility.
Xplain: Analyzing Invisible Correlations in Model Explanation
Kavita Kumari and Alessandro Pegoraro, Technical University of Darmstadt; Hossein Fereidooni, Kobil; Ahmad-Reza Sadeghi, Technical University of Darmstadt
Explanation methods analyze the features in backdoored input data that contribute to model misclassification. However, current methods like path techniques struggle to detect backdoor patterns in adversarial situations. They fail to grasp the hidden associations of backdoor features with other input features, leading to misclassification. Additionally, they suffer from irrelevant data attribution, imprecise feature connections, baseline dependence, and vulnerability to the "saturation effect".
To address these limitations, we propose Xplain. Our method aims to uncover hidden backdoor trigger patterns and the subtle relationships between backdoor features and other input objects, which are the main causes of model misclassification. Our algorithm improves existing path techniques by integrating an additional baseline into the Integrated Gradients (IG) formulation. This ensures that features selected in the baseline persist along the integration path, guaranteeing baseline independence. Additionally, we introduce quantitative noise to interpolate samples along the integration path, which reduces feature dependency and captures non-linear interactions. This approach effectively identifies the relevant features that significantly influence model predictions.
Furthermore, Xplain proposes sensitivity analysis to enhance AI system resilience against backdoor attacks. This uncovers clear connections between the backdoor and other input data features, thus shedding light on relevant interactions. We thoroughly test the effectiveness of Xplain on the Imagenet and the multimodal domain of the Visual Question Answering dataset, showing its superiority over current path methods such as Integrated Gradient (IG), left-IG, Guided IG, and Adversarial Gradient Integration (AGI) techniques.
Verify your Labels! Trustworthy Predictions and Datasets via Confidence Scores
Torsten Krauß, Jasper Stang, and Alexandra Dmitrienko, University of Würzburg
Machine learning is a rapidly evolving technology with manifold benefits. At its core lies the mapping between samples and corresponding target labels (SL-Mappings). Such mappings can originate from labeled dataset samples or from prediction generated during model inference. The correctness of SL-Mappings is crucial, both during training and for model predictions, especially when considering poisoning attacks.
Existing standalone works from the dataset cleaning and prediction confidence scoring domains lack a dual-use tool offering an SL-Mappings score, which is impractical. Moreover, these works have drawbacks, e.g., dependence on specific model architectures and reliance on large datasets, which may not be accessible, or lack a meaningful confidence score.
In this paper, we introduce LabelTrust, a versatile tool designed to generate confidence scores for SL-Mappings. We propose pipelines facilitating dataset cleaning and confidence scoring, mitigating the limitations of existing standalone approaches from each domain. Thereby, LabelTrust leverages a Siamese network trained via few-shot learning, requiring minimal clean samples and is agnostic to datasets and model architectures. We demonstrate LabelTrust's efficacy in detecting poisoning attacks within samples and predictions alike, with a modest one-time training overhead of 34.56 seconds and an evaluation time of less than 1 second per SL-Mapping.
ML VI: Digital Adversarial Attacks
More Simplicity for Trainers, More Opportunity for Attackers: Black-Box Attacks on Speaker Recognition Systems by Inferring Feature Extractor
Yunjie Ge, Pinji Chen, Qian Wang, Lingchen Zhao, and Ningping Mou, Wuhan University; Peipei Jiang, Wuhan University; City University of Hong Kong; Cong Wang, City University of Hong Kong; Qi Li, Tsinghua University; Chao Shen, Xi'an Jiaotong University
Recent studies have revealed that deep learning-based speaker recognition systems (SRSs) are vulnerable to adversarial examples (AEs). However, the practicality of existing black-box AE attacks is restricted by the requirement for extensive querying of the target system or the limited attack success rates (ASR). In this paper, we introduce VoxCloak, a new targeted AE attack with superior performance in both these aspects. Distinct from existing methods that optimize AEs by querying the target model, VoxCloak initially employs a small number of queries (e.g., a few hundred) to infer the feature extractor used by the target system. It then utilizes this feature extractor to generate any number of AEs locally without the need for further queries. We evaluate VoxCloak on four commercial speaker recognition (SR) APIs and seven voice assistants. On the SR APIs, VoxCloak surpasses the existing transfer-based attacks, improving ASR by 76.25% and signal-to-noise ratio (SNR) by 13.46 dB, as well as the decision-based attacks, requiring 33 times fewer queries and improving SNR by 7.87 dB while achieving comparable ASRs. On the voice assistants, VoxCloak outperforms the existing methods with a 49.40% improvement in ASR and a 15.79 dB improvement in SNR.
Transferability of White-box Perturbations: Query-Efficient Adversarial Attacks against Commercial DNN Services
Meng Shen and Changyue Li, School of Cyberspace Science and Technology, Beijing Institute of Technology, China; Qi Li, Institute for Network Sciences and Cyberspace, Tsinghua University, China; Hao Lu, School of Computer Science and Technology, Beijing Institute of Technology, China; Liehuang Zhu, School of Cyberspace Science and Technology, Beijing Institute of Technology, China; Ke Xu, Department of Computer Science, Tsinghua University, China
Deep Neural Networks (DNNs) have been proven to be vulnerable to adversarial attacks. Existing decision-based adversarial attacks require large numbers of queries to find an effective adversarial example, resulting in a heavy query cost and also performance degradation under defenses. In this paper, we propose the Dispersed Sampling Attack (DSA), which is a query-efficient decision-based adversarial attack by exploiting the transferability of white-box perturbations. DSA can generate diverse examples with different locations in the embedding space, which provides more information about the adversarial region of substitute models and allows us to search for transferable perturbations. Specifically, DSA samples in a hypersphere centered on an original image, and progressively constrains the perturbation. Extensive experiments are conducted on public datasets to evaluate the performance of DSA in closed-set and open-set scenarios. DSA outperforms the state-of-the-art attacks in terms of both attack success rate (ASR) and average number of queries (AvgQ). Specifically, DSA achieves an ASR of about 90% with an AvgQ of 200 on 4 well-known commercial DNN services.
Adversarial Illusions in Multi-Modal Embeddings
Tingwei Zhang and Rishi Jha, Cornell University; Eugene Bagdasaryan, University of Massachusetts Amherst; Vitaly Shmatikov, Cornell Tech
Distinguished Paper Award Winner
Multi-modal embeddings encode texts, images, thermal images, sounds, and videos into a single embedding space, aligning representations across different modalities (e.g., associate an image of a dog with a barking sound). In this paper, we show that multi-modal embeddings can be vulnerable to an attack we call "adversarial illusions." Given an image or a sound, an adversary can perturb it to make its embedding close to an arbitrary, adversary-chosen input in another modality.
These attacks are cross-modal and targeted: the adversary can align any image or sound with any target of his choice. Adversarial illusions exploit proximity in the embedding space and are thus agnostic to downstream tasks and modalities, enabling a wholesale compromise of current and future tasks, as well as modalities not available to the adversary. Using ImageBind and AudioCLIP embeddings, we demonstrate how adversarially aligned inputs, generated without knowledge of specific downstream tasks, mislead image generation, text generation, zero-shot classification, and audio retrieval.
We investigate transferability of illusions across different embeddings and develop a black-box version of our method that we use to demonstrate the first adversarial alignment attack on Amazon's commercial, proprietary Titan embedding. Finally, we analyze countermeasures and evasion attacks.
It Doesn't Look Like Anything to Me: Using Diffusion Model to Subvert Visual Phishing Detectors
Qingying Hao and Nirav Diwan, University of Illinois at Urbana-Champaign; Ying Yuan, University of Padua; Giovanni Apruzzese, University of Liechtenstein; Mauro Conti, University of Padua; Gang Wang, University of Illinois at Urbana-Champaign
Visual phishing detectors rely on website logos as the invariant identity indicator to detect phishing websites that mimic a target brand's website. Despite their promising performance, the robustness of these detectors is not yet well understood. In this paper, we challenge the invariant assumption of these detectors and propose new attack tactics, LogoMorph, with the ultimate purpose of enhancing these systems. LogoMorph is rooted in a key insight: users can neglect large visual perturbations on the logo as long as the perturbation preserves the original logo's semantics. We devise a range of attack methods to create semantic-preserving adversarial logos, yielding phishing webpages that bypass state-of-the-art detectors. For text-based logos, we find that using alternative fonts can help to achieve the attack goal. For image-based logos, we find that an adversarial diffusion model can effectively capture the style of the logo while generating new variants with large visual differences. Practically, we evaluate LogoMorph with white-box and black-box experiments and test the resulting adversarial webpages against various visual phishing detectors end-to-end. User studies (n = 150) confirm the effectiveness of our adversarial phishing webpages on end users (with a detection rate of 0.59, barely better than a coin toss). We also propose and evaluate countermeasures, and share our code.
Invisibility Cloak: Proactive Defense Against Visual Game Cheating
Chenxin Sun, Kai Ye, Liangcai Su, Jiayi Zhang, and Chenxiong Qian, The University of Hong Kong
The gaming industry has experienced remarkable innovation and rapid growth in recent years. However, this progress has been accompanied by a concerning increase in First-person Shooter game cheating, with aimbots being the most prevalent and harmful tool. Visual aimbots, in particular, utilize game visuals and integrated visual models to extract game information, providing cheaters with automatic shooting abilities. Unfortunately, existing anti-cheating methods have proven ineffective against visual aimbots. To combat visual aimbots, we introduce the first proactive defense framework against visual game cheating, called Invisibility Cloak. Our approach adds imperceptible perturbations to game visuals, making them unrecognizable to AI models. We conducted extensive experiments on popular games CrossFire (CF) and Counter-Strike 2 (CS2), and our results demonstrate that Invisibility Cloak achieves real-time re-rendering of high-quality game visuals while effectively impeding various mainstream visual cheating models. By deploying Invisibility Cloak online in both CF and CS2, we successfully eliminated almost all aiming and shooting behaviors associated with aimbots, significantly enhancing the gaming experience for legitimate players.
Security Analysis III: Protocol
Logic Gone Astray: A Security Analysis Framework for the Control Plane Protocols of 5G Basebands
Kai Tu, Abdullah Al Ishtiaq, Syed Md Mukit Rashid, Yilu Dong, Weixuan Wang, Tianwei Wu, and Syed Rafiul Hussain, Pennsylvania State University
Distinguished Paper Award Winner
We develop 5GBaseChecker— an efficient, scalable, and dynamic security analysis framework based on differential testing for analyzing 5G basebands' control plane protocol interactions. 5GBaseChecker first captures basebands' protocol behaviors as a finite state machine (FSM) through black-box automata learning. To facilitate efficient learning and improve scalability, 5GBaseChecker introduces novel hybrid and collaborative learning techniques. 5GBaseChecker then identifies input sequences for which the extracted FSMs provide deviating outputs. Finally, 5GBaseChecker leverages these deviations to efficiently identify the security properties from specifications and use those to triage if the deviations found in 5G basebands violate any properties. We evaluated 5GBaseChecker with 17 commercial 5G basebands and 2 open-source UE implementations and uncovered 22 implementation-level issues, including 13 exploitable vulnerabilities and 2 interoperability issues.
SPF Beyond the Standard: Management and Operational Challenges in Practice and Practical Recommendations
Md. Ishtiaq Ashiq and Weitong Li, Virginia Tech; Tobias Fiebig, Max-Planck-Institut für Informatik; Taejoong Chung, Virginia Tech
Since its inception in the 1970s, email has emerged as an irreplaceable medium for global communication. Despite its ubiquity, the system is plagued by security vulnerabilities, such as email spoofing. Among the various countermeasures, the Sender Policy Framework (SPF) remains a seminal and commonly deployed solution, working by specifying a list of authorized IP addresses for sending email.
While SPF might seem simple on the surface, the practical management of its records proves to be challenging; for example, although syntactical errors are uncommon (0.4%), evaluation-phase challenges are prevalent (7.7%), leading to potential disruptions in email delivery.
In our paper, we conduct a comprehensive study on the SPF extension, drawing from 17 months of weekly data snapshots that span 176 million domains across four top-level domains; we delve into the reasons behind such prevalent evaluation errors. Simultaneously, we undertake an ethical methodology to explore how SMTP servers validate SPF records and evaluate the effectiveness of widely-used software implementations. Our study unveils potential attack vectors that could be exploited for DNS amplification attacks or disrupt mail distribution; for instance, we demonstrate how an attacker could temporarily impede email reception by exploiting flaws in SPF validation mechanisms. We also conduct a qualitative study among email administrators to gain insights into the practical implementation and usage of SPF and SPF validators. Based on our findings, we provide recommendations designed to reconcile these discrepancies and bolster the SPF ecosystem's overall security.
A Formal Analysis of SCTP: Attack Synthesis and Patch Verification
Jacob Ginesin, Max von Hippel, Evan Defloor, and Cristina Nita-Rotaru, Northeastern University; Michael Tüxen, FH Münster
SCTP is a transport protocol offering features such as multi-homing, multi-streaming, and message-oriented delivery. Its two main implementations were subjected to conformance tests using the PacketDrill tool. Conformance testing is not exhaustive and a recent vulnerability (CVE-2021-3772) showed SCTP is not immune to attacks. Changes addressing the vulnerability were implemented, but the question remains whether other flaws might persist in the protocol design.
We study the security of the SCTP design, taking a rigorous approach rooted in formal methods. We create a formal Promela model of SCTP, and define 10 properties capturing the essential protocol functionality based on its RFC specification and consultation with the lead RFC author. Then we show using the SPIN model checker that our model satisfies these properties. We next define 4 representative attacker models – Off-Path, where the attacker is an outsider that can spoof the port and IP of a peer; Evil-Server, where the attacker is a malicious peer; Replay, where an attacker can capture and replay, but not modify, packets; and On-Path, where the attacker controls the channel between peers. SCTP was designed to be secure against Off-Path attackers, and we study the additional models in order to understand how its security degrades for successively more powerful attacker types. We modify an attack synthesis tool designed for transport protocols, KORG, to support our SCTP model and 4 attacker models.
We synthesize the vulnerability reported in CVE-2021- 3772 in the Off-Path attacker model, when the patch is disabled, and we show that when enabled, the patch eliminates the vulnerability. We also manually identify two ambiguities in the RFC, and using KORG, we show that each, if misinterpreted, opens the protocol to a new Off-Path attack. We show that SCTP is vulnerable to a variety of attacks when it is misused in the Evil-Server, Replay, or On-Path attacker models (for which it was not designed). We discuss these and, when possible, mitigations thereof. Finally, we propose two RFC errata – one to eliminate each ambiguity – of which so far, the SCTP RFC committee has accepted one.
Athena: Analyzing and Quantifying Side Channels of Transport Layer Protocols
Feiyang Yu, Duke University; Quan Zhou and Syed Rafiul Hussain, Pennsylvania State University; Danfeng Zhang, Duke University
Recent research has shown a growing number of side-channel vulnerabilities in transport layer protocols, such as TCP and UDP. Those side channels can be exploited by adversaries to launch nefarious attacks. In this paper, we present Athena, an automated tool for detecting, quantifying and explaining side-channel vulnerabilities in vanilla implementations of transport layer protocols. Unlike prior tools, Athena adopts a novel graph-based analysis, making it scalable enough to be the first side-channel analysis tool that can comprehensively analyze the TCP and UDP implementations in several operating systems with significantly higher coverage than the state-of-the-art. Moreover, Athena uses an entropy-based algorithm to identify the most important vulnerabilities. Evaluation on several benchmarks including Linux, FreeBSD, OpenBSD and two open-source IPv4 implementations suggests that Athena can narrow down critical side channels to a single digit (among over 1000 candidates) with a low false positive rate. Besides covering known side channels, Athena also discovers 30 new potential attack surfaces.
Shaken, not Stirred - Automated Discovery of Subtle Attacks on Protocols using Mix-Nets
Jannik Dreier, Université de Lorraine, CNRS, Inria, LORIA; Pascal Lafourcade and Dhekra Mahmoud, Université de Clermont Auvergne, LIMOS
Mix-Nets are used to provide anonymity by passing a list of inputs through a collection of mix servers. Each server mixes the entries to create a new anonymized list, so that the correspondence between the output and the input is hidden. These Mix-Nets are used in numerous protocols in which the anonymity of participants is required, for example voting or electronic exam protocols. Some of these protocols have been proven secure using automated tools such as the cryptographic protocol verifier ProVerif, although they use the Mix-Net incorrectly. We propose a more detailed formal model of exponentiation and re-encryption Mix-Nets in the applied Π-Calculus, the language used by ProVerif, and show that using this model we can automatically discover attacks based on the incorrect use of the Mix-Net. In particular, we (re-)discover attacks on four cryptographic protocols using ProVerif: we show that an electronic exam protocol, two electronic voting protocols, and the "Crypto Santa" protocol do not satisfy the desired privacy properties. We then fix the vulnerable protocols by adding missing zero-knowledge proofs and analyze the resulting protocols using ProVerif. Again, in addition to the common abstract modeling of Zero Knowledge Proofs (ZKP), we also use a special model corresponding to weak (malleable) ZKPs. We show that in this case all attacks persist, and that we can again (re)discover these attacks automatically.
Cryptographic Protocols II
Rabbit-Mix: Robust Algebraic Anonymous Broadcast from Additive Bases
Chongwon Cho and Samuel Dittmer, Stealth Software Technologies Inc.; Yuval Ishai, Technion; Steve Lu, Stealth Software Technologies Inc.; Rafail Ostrovsky, UCLA
We present Rabbit-Mix, a robust algebraic mixing-based anonymous broadcast protocol in the client-server model. Rabbit-Mix is the first practical sender-anonymous broadcast protocol satisfying both robustness and 100% message delivery assuming a (strong) honest majority of servers. It presents roughly 3x improvement in comparison to Blinder (CCS 2020), a previous anonymous broadcast protocol in the same model, in terms of the number of algebraic operations and communication, while at the same time eliminating the non-negligible failure probability of Blinder. To obtain these improvements, we combine the use of Newton's identities for mixing with a novel way of exploiting an algebraic structure in the powers of field elements, based on an {\em additive 2-basis}, to compactly encode and decode client messages. We also introduce a simple and efficient distributed protocol to verify the well-formedness of client input encodings, which should consist of shares of multiple arithmetic progressions tied together.
PerfOMR: Oblivious Message Retrieval with Reduced Communication and Computation
Zeyu Liu, Yale University; Eran Tromer, Boston University; Yunhao Wang, Yale University
Anonymous message delivery, as in privacy-preserving blockchain and private messaging applications, needs to protect recipient metadata: eavesdroppers should not be able to link messages to their recipients. This raises the question: how can untrusted servers assist in delivering the pertinent messages to each recipient, without learning which messages are addressed to whom?
Recent work constructed Oblivious Message Retrieval (OMR) protocols that outsource the message detection and retrieval in a privacy-preserving way, using homomorphic encryption. Their construction exhibits significant costs in computation per message scanned (∼0.1 second), as well as in the size of the associated messages (∼1kB overhead) and public keys (∼132kB).
This work constructs more efficient OMR schemes, by replacing the LWE-based clue encryption of prior works with a Ring-LWE variant, and utilizing the resulting flexibility to improve several components of the scheme. We thus devise, analyze, and benchmark two protocols:
The first protocol focuses on improving the detector runtime, using a new retrieval circuit that can be homomorphically evaluated 15x faster than the prior work.
The second protocol focuses on reducing the communication costs, by designing a different homomorphic decryption circuit that allows the parameter of the Ring-LWE encryption to be set such that the public key size is about 235x smaller than the prior work, and the message size is roughly 1.6x smaller. The runtime of this second construction is ∼40.0ms per message, still more than 2.5x faster than prior works.
Fast RS-IOP Multivariate Polynomial Commitments and Verifiable Secret Sharing
Zongyang Zhang, Weihan Li, Yanpei Guo, and Kexin Shi, Beihang University; Sherman S. M. Chow, The Chinese University of Hong Kong; Ximeng Liu, Fuzhou University; Jin Dong, Beijing Academy of Blockchain and Edge Computing
Supporting proofs of evaluations, polynomial commitment schemes (PCS) are crucial in secure distributed systems. Schemes based on fast Reed–Solomon interactive oracle proofs (RS-IOP) of proximity have recently emerged, offering transparent setup, plausible post-quantum security, efficient operations, and, notably, sublinear proof size and verification. Manifesting a new paradigm, PCS with one-to-many proof can enhance the performance of (asynchronous) verifiable secret sharing ((A)VSS), a cornerstone in distributed computing, for proving multiple evaluations to multiple verifiers. Current RS-IOP-based multivariate PCS, including HyperPlonk (Eurocrypt '23) and Virgo (S&P '20), however, only offer quasi-linear prover complexity in the polynomial size.
We propose PolyFRIM, a fast RS-IOP-based multivariate PCS with optimal linear prover complexity, 5-25× faster than prior arts while ensuring competent proof size and verification. Heeding the challenging absence of FFT circuits for multivariate evaluation, PolyFRIM surpasses Zhang et al.'s (Usenix Sec. '22) one-to-many univariate PCS, accelerating proving by 4-7× and verification by 2-4× with 25% shorter proof. Leveraging PolyFRIM, we propose an AVSS scheme FRISS with a better efficiency tradeoff than prior arts from multivariate PCS, including Bingo (Crypto '23) and Haven (FC '21).
Abuse Reporting for Metadata-Hiding Communication Based on Secret Sharing
Saba Eskandarian, University of North Carolina at Chapel Hill
As interest in metadata-hiding communication grows in both research and practice, a need exists for stronger abuse reporting features on metadata-hiding platforms. While message franking has been deployed on major end-to-end encrypted platforms as a lightweight and effective abuse reporting feature, there is no comparable technique for metadata-hiding platforms. Existing efforts to support abuse reporting in this setting, such as asymmetric message franking or the Hecate scheme, require order of magnitude increases in client and server computation or fundamental changes to the architecture of messaging systems. As a result, while metadata-hiding communication inches closer to practice, critical content moderation concerns remain unaddressed.
This paper demonstrates that, for broad classes of metadata-hiding schemes, lightweight abuse reporting can be deployed with minimal changes to the overall architecture of the system. Our insight is that much of the structure needed to support abuse reporting already exists in these schemes. By taking a non-generic approach, we can reuse this structure to achieve abuse reporting with minimal overhead. In particular, we show how to modify schemes based on secret sharing user inputs to support a message franking-style protocol. Compared to prior work, our shared franking technique more than halves the time to prepare a franked message and gives order of magnitude reductions in server-side message processing times, as well as in the time to decrypt a message and verify a report.
SOAP: A Social Authentication Protocol
Felix Linker and David Basin, Department of Computer Science, ETH Zurich
Social authentication has been suggested as a usable authentication ceremony to replace manual key authentication in messaging applications. Using social authentication, chat partners authenticate their peers using digital identities managed by identity providers. In this paper, we formally define social authentication, present a protocol called SOAP that largely automates social authentication, formally prove SOAP's security, and demonstrate SOAP's practicality in two prototypes. One prototype is web-based, and the other is implemented in the open-source Signal messaging application.
Using SOAP, users can significantly raise the bar for compromising their messaging accounts. In contrast to the default security provided by messaging applications such as Signal and WhatsApp, attackers must compromise both the messaging account and all identity provider-managed identities to attack a victim. In addition to its security and automation, SOAP is straightforward to adopt as it is built on top of the well-established OpenID Connect protocol.
12:00 pm–1:30 pm
Symposium Luncheon and Test of Time Award Presentation
Franklin Hall
1:30 pm–2:45 pm
User Studies IV: Policies and Best Practices I
How WEIRD is Usable Privacy and Security Research?
Ayako A. Hasegawa and Daisuke Inoue, NICT; Mitsuaki Akiyama, NTT
In human factor fields such as human-computer interaction (HCI) and psychology, researchers have been concerned that participants mostly come from WEIRD (Western, Educated, Industrialized, Rich, and Democratic) countries. This WEIRD skew may hinder understanding of diverse populations and their cultural differences. The usable privacy and security (UPS) field has inherited many research methodologies from research on human factor fields. We conducted a literature review to understand the extent to which participant samples in UPS papers were from WEIRD countries and the characteristics of the methodologies and research topics in each user study recruiting Western or non-Western participants. We found that the skew toward WEIRD countries in UPS is greater than that in HCI. Geographic and linguistic barriers in the study methods and recruitment methods may cause researchers to conduct user studies locally. In addition, many papers did not report participant demographics, which could hinder the replication of the reported studies, leading to low reproducibility. To improve geographic diversity, we provide the suggestions including facilitate replication studies, address geographic and linguistic issues of study/recruitment methods, and facilitate research on the topics for non-WEIRD populations.
Security and Privacy Software Creators' Perspectives on Unintended Consequences
Harshini Sri Ramulu, Paderborn University & The George Washington University; Helen Schmitt, Paderborn University; Dominik Wermke, North Carolina State University; Yasemin Acar, Paderborn University & The George Washington University
Security & Privacy (S&P) software is created to have positive impacts on people: to protect them from surveillance and attacks, enhance their privacy, and keep them safe. Despite these positive intentions, S&P software can have unintended consequences, such as enabling and protecting criminals, misleading people into using the software with a false sense of security, and being inaccessible to users without strong technical backgrounds or with specific accessibility needs. In this study, through 14 semi-structured expert interviews with S&P software creators, we explore whether and how S&P software creators foresee and mitigate unintended consequences. We find that unintended consequences are often overlooked and ignored. When addressed, they are done in unstructured ways—often ad hoc and just based on user feedback—thereby shifting the burden to users. To reduce this burden on users and more effectively create positive change, we recommend S&P software creators to proactively consider and mitigate unintended consequences through increasing awareness and education, promoting accountability at the organizational level to mitigate issues, and using systematic toolkits for anticipating impacts.
Engaging Company Developers in Security Research Studies: A Comprehensive Literature Review and Quantitative Survey
Raphael Serafini, Stefan Albert Horstmann, and Alena Naiakshina, Ruhr University Bochum
Previous research demonstrated that company developers excel compared to freelancers and computer science students, with the corporate environment significantly influencing security and privacy behavior. Still, the challenge of recruiting a substantial number of company developers persists, primarily due to a lack of knowledge on how to motivate their participation in empirical research studies. To bridge this gap, we performed a literature review and identified a conspicuous absence of information regarding compensation and study length in the domain of security developer studies. To support researchers struggling with the recruitment of company developers, we conducted an extensive quantitative survey with 340 professionals. Our study revealed that 62.5% of developers prioritize security tasks over software engineering tasks, and 96.5% are willing to participate in security studies. Developers consistently ranked security higher than other barriers and motivators. However, repeat participants perceived security tasks as more challenging than first-time participants despite having 40% more general experience and 50% more security-related experience. Further, we discuss Qualtrics as a potential recruitment channel for engaging company developers, acknowledging various challenges. Based on our findings, we provide recommendations for recruiting a high number of company developers.
"What Keeps People Secure is That They Met The Security Team": Deconstructing Drivers And Goals of Organizational Security Awareness
Jonas Hielscher, Ruhr University Bochum; Simon Parkin, Delft University of Technology
Security awareness campaigns in organizations now collectively cost billions of dollars annually. There is increasing focus on ensuring certain security behaviors among employees. On the surface, this would imply a user-centered view of security in organizations. Despite this, the basis of what security awareness managers do and what decides this are unclear. We conducted n=15 semi-structured interviews with full-time security awareness managers, with experience across various national and international companies in European countries, with thousands of employees. Through thematic analysis, we identify that success in awareness management is fragile while having the potential to improve; there are a range of restrictions, and mismatched drivers and goals for security awareness, affecting how it is structured, delivered, measured, and improved. We find that security awareness as a practice is underspecified, and split between messaging around secure behaviors and connecting to employees, with a lack of recognition for the measures that awareness managers regard as important. We discuss ways forward, including alternative indicators of success, and security usability advocacy for employees.
Unveiling the Hunter-Gatherers: Exploring Threat Hunting Practices and Challenges in Cyber Defense
Priyanka Badva, Kopo M. Ramokapane, Eleonora Pantano, and Awais Rashid, University of Bristol
The dynamic landscape of cyber threats constantly adapts its attack patterns, successfully evading traditional defense mechanisms and operating undetected until its objectives are fulfilled. In response to these elusive threats, threat hunting has become a crucial advanced defense technique against sophisticated and concealed cyber adversaries. However, despite its significance, there remains a lack of deep understanding of the best practices and challenges associated with effective threat hunting. To address this gap, we conducted semi-structured interviews with 22 experienced threat hunters to gain deeper insights into their daily practices, challenges, and strategies to overcome them. Our findings show that threat hunters deploy various approaches, often mixing them. They argue that flexibility in their approach helps them identify subtle threat indicators that might otherwise go undetected if using only one method. Their everyday challenges range from technical challenges to people and organizational culture challenges. Based on these findings, we provide empirical insights for improving threat-hunting best practices.
Side Channel IV
Pixel Thief: Exploiting SVG Filter Leakage in Firefox and Chrome
Sioli O'Connell, The University of Adelaide; Lishay Aben Sour and Ron Magen, Ben Gurion University of the Negev; Daniel Genkin, Georgia Institute of Technology; Yossi Oren, Ben-Gurion University of the Negev and Intel Corporation; Hovav Shacham, UT Austin; Yuval Yarom, Ruhr University Bochum
Web privacy is challenged by pixel-stealing attacks, which allow attackers to extract content from embedded iframes and to detect visited links. To protect against multiple pixelstealing attacks that exploited timing variations in SVG filters, browser vendors repeatedly adapted their implementations to eliminate timing variations. In this work we demonstrate that past efforts are still not sufficient.
We show how web-based attackers can mount cache-based side-channel attacks to monitor data-dependent memory accesses in filter rendering functions. We identify conditions under which browsers elect the non-default CPU implementation of SVG filters, and develop techniques for achieving access to the high-resolution timers required for cache attacks. We then develop efficient techniques to use the pixel-stealing attack for text recovery from embedded pages and to achieve high-speed history sniffing. To the best of our knowledge, our attack is the first to leak multiple bits per screen refresh, achieving an overall rate of 267 bits per second.
Sync+Sync: A Covert Channel Built on fsync with Storage
Qisheng Jiang and Chundong Wang, ShanghaiTech University
Scientists have built a variety of covert channels for secretive information transmission with CPU cache and main memory. In this paper, we turn to a lower level in the memory hierarchy, i.e., persistent storage. Most programs store intermediate or eventual results in the form of files and some of them call fsync to synchronously persist a file with storage device for orderly persistence. Our quantitative study shows that one program would undergo significantly longer response time for fsync call if the other program is concurrently calling fsync, although they do not share any data. We further find that, concurrent fsync calls contend at multiple levels of storage stack due to sharing software structures (e.g., Ext4's journal) and hardware resources (e.g., disk's I/O dispatch queue).
We accordingly build a covert channel named Sync+Sync. Sync+Sync delivers a transmission bandwidth of 20,000 bits per second at an error rate of about 0.40% with an ordinary solid-state drive. Sync+Sync can be conducted in cross-disk partition, cross-file system, cross-container, cross-virtual machine, and even cross-disk drive fashions, without sharing data between programs. Next, we launch side-channel attacks with Sync+Sync and manage to precisely detect operations of a victim database (e.g., insert/update and B-Tree node split). We also leverage Sync+Sync to distinguish applications and websites with high accuracy by detecting and analyzing their fsync frequencies and flushed data volumes. These attacks are useful to support further fine-grained information leakage.
What Was Your Prompt? A Remote Keylogging Attack on AI Assistants
Roy Weiss, Daniel Ayzenshteyn, Guy Amit, and Yisroel Mirsky, Ben Gurion University of the Negev
AI assistants are becoming an integral part of society, used for asking advice or help in personal and confidential issues. In this paper, we unveil a novel side-channel that can be used to read encrypted responses from AI Assistants over the web: the token-length side-channel. The side-channel reveals the character-lengths of a response's tokens (akin to word lengths). We found that many vendors, including OpenAI and Microsoft, had this side-channel prior to our disclosure.
However, inferring a response's content with this side-channel is challenging. This is because, even with knowledge of token-lengths, a response can have hundreds of words resulting in millions of grammatically correct sentences. In this paper, we show how this can be overcome by (1) utilizing the power of a large language model (LLM) to translate these token-length sequences, (2) providing the LLM with inter-sentence context to narrow the search space and (3) performing a known-plaintext attack by fine-tuning the model on the target model's writing style.
Using these methods, we were able to accurately reconstruct 27% of an AI assistant's responses and successfully infer the topic from 53% of them. To demonstrate the threat, we performed the attack on OpenAI's ChatGPT-4 and Microsoft's Copilot on both browser and API traffic.
NetShaper: A Differentially Private Network Side-Channel Mitigation System
Amir Sabzi, Rut Vora, Swati Goswami, Margo Seltzer, Mathias Lécuyer, and Aastha Mehta, University of British Columbia
The widespread adoption of encryption in network protocols has significantly improved the overall security of many Internet applications. However, these protocols cannot prevent network side-channel leaks—leaks of sensitive information through the sizes and timing of network packets. We present NetShaper, a system that mitigates such leaks based on the principle of traffic shaping. NetShaper's traffic shaping provides differential privacy guarantees while adapting to the prevailing workload and congestion condition, and allows configuring a tradeoff between privacy guarantees, bandwidth and latency overheads. Furthermore, NetShaper provides a modular and portable tunnel endpoint design that can support diverse applications. We present a middlebox-based implementation of NetShaper and demonstrate its applicability in a video streaming and a web service application.
SoK: Neural Network Extraction Through Physical Side Channels
Péter Horváth, Dirk Lauret, Zhuoran Liu, and Lejla Batina, Radboud University
Deep Neural Networks (DNNs) are widely used in various applications and are typically deployed on hardware accelerators. Physical Side-Channel Analysis (SCA) on DNN implementations is getting more attention from both industry and academia because of the potential to severely jeopardize the confidentiality of DNN Intellectual Property (IP) and the data privacy of end users. Current physical SCA attacks on DNNs are highly platform dependent and employ distinct threat models for different attack objectives and analysis tools, necessitating a general revision of attack methodology and assumptions. To this end, we provide a taxonomy of previous physical SCA attacks on DNNs and systematize findings toward model extraction and input recovery. Specifically, we discuss the dependencies of threat models on attack objectives and analysis methods, for which we present a novel systematic attack framework composed of fundamental stages derived from various attacks. Following the framework, we provide an in-depth analysis of common SCA attacks for each attack objective and reveal practical limitations, validated by experiments on a state-of-the-art commercial DNN accelerator. Based on our findings, we identify challenges and suggest future directions.
Cloud Security
ACAI: Protecting Accelerator Execution with Arm Confidential Computing Architecture
Supraja Sridhara, Andrin Bertschi, Benedict Schlüter, Mark Kuhne, Fabio Aliberti, and Shweta Shinde, ETH Zurich
Trusted execution environments in several existing and upcoming CPUs demonstrate the success of confidential computing, with the caveat that tenants cannot securely use accelerators such as GPUs and FPGAs. In this paper, we reconsider the Arm Confidential Computing Architecture (CCA) design, an upcoming TEE feature in Armv9-A, to address this gap. We observe that CCA offers the right abstraction and mechanisms to allow confidential VMs to use accelerators as a first-class abstraction. We build ACAI, a CCA-based solution, with a principled approach of extending CCA security invariants to device-side access to address several critical security gaps. Our experimental results on GPU and FPGA demonstrate the feasibility of ACAI while maintaining security guarantees.
ChainPatrol: Balancing Attack Detection and Classification with Performance Overhead for Service Function Chains Using Virtual Trailers
Momen Oqaily and Hinddeep Purohit, CIISE, Concordia University; Yosr Jarraya, Ericsson Security Research; Lingyu Wang, CIISE, Concordia University; Boubakr Nour and Makan Pourzandi, Ericsson Security Research; Mourad Debbabi, CIISE, Concordia University
Network functions virtualization enables tenants to outsource their service function chains (SFCs) to third-party clouds for better agility and cost-effectiveness. However, outsourcing may limit tenants' ability to directly inspect cloud-level deployments to detect attacks on SFC forwarding paths, such as network function bypass or traffic injection. Existing solutions requiring direct cloud access are unsuitable for outsourcing, and adding a cryptographic trailer to every packet may incur significant performance overhead over large flows. In this paper, we propose ChainPatrol, a lightweight solution for tenants to continuously detect and classify cloud-level attacks on SFCs. Our main idea is to "virtualize'' cryptographic trailers by encoding them as side-channel watermarks, such that they can be transmitted without adding extra bits to packets. We tackle several key challenges like encoding virtual trailers within the limited side channel capacity, minimizing packet delay, and tolerating unexpected network jitters. We implement our solution on Amazon EC2, and our experiments with real-life data and applications demonstrate that ChainPatrol can achieve a better balance between security (e.g., 100% detection accuracy and 70% classification accuracy) and overhead (e.g., almost zero increased traffic and negligible end-to-end delay) than existing works (e.g., up to 45% overhead reduction compared to a state-of-the-art solution).
HECKLER: Breaking Confidential VMs with Malicious Interrupts
Benedict Schlüter, Supraja Sridhara, Mark Kuhne, Andrin Bertschi, and Shweta Shinde, ETH Zurich
Hardware-based Trusted execution environments (TEEs) offer an isolation granularity of virtual machine abstraction. They provide confidential VMs (CVMs) that host security-sensitive code and data. AMD SEV-SNP and Intel TDX enable CVMs and are now available on popular cloud platforms. The untrusted hypervisor in these settings is in control of several resource management and configuration tasks, including interrupts. We present HECKLER, a new attack wherein the hypervisor injects malicious non-timer interrupts to break the confidentiality and integrity of CVMs. Our insight is to use the interrupt handlers that have global effects, such that we can manipulate a CVM's register states to change the data and control flow. With AMD SEV-SNP and Intel TDX, we demonstrate HECKLER on OpenSSH and sudo to bypass authentication. On AMD SEV-SNP we break execution integrity of C, Java, and Julia applications that perform statistical and text analysis. We explain the gaps in current defenses and outline guidelines for future defenses.
Stateful Least Privilege Authorization for the Cloud
Leo Cao, Luoxi Meng, Deian Stefan, and Earlence Fernandes, UC San Diego
Architecting an authorization protocol that enforces least privilege in the cloud is challenging. For example, when Zoom integrates with Google Calendar, Zoom obtains a bearer token—a credential that grants broad access to user data on the server. Widely-used authorization protocols like OAuth create overprivileged credentials because they do not provide developers of client apps and servers the tools to request and enforce minimal access. In the status quo, these overprivileged credentials are vulnerable to abuse when stolen or leaked. We introduce an authorization framework that enables creating and using bearer tokens that are least privileged. Our core insight is that the client app developer always knows their minimum privilege requirements when requesting access to user resources on a server. Our framework allows client app developers to write small programs in WebAssembly that customize and attenuate the privilege of OAuth-like bearer tokens. The server executes these programs to enforce that requests are least privileged. Building on this primary mechanism, we introduce a new class of stateful least privilege policies—authorization rules that can depend on a log of actions a client has taken on a server. We instantiate our authorization model for the popular OAuth protocol. Using open source client apps, we show how they can reduce their privilege using a variety of stateful policies enabled by our work.
GraphGuard: Private Time-Constrained Pattern Detection Over Streaming Graphs in the Cloud
Songlei Wang and Yifeng Zheng, Harbin Institute of Technology; Xiaohua Jia, Harbin Institute of Technology and City University of Hong Kong
Streaming graphs have seen wide adoption in diverse scenarios due to their superior ability to capture temporal interactions among entities. With the proliferation of cloud computing, it has become increasingly common to utilize the cloud for storing and querying streaming graphs. Among others, streaming graphs-based time-constrained pattern detection, which aims to continuously detect subgraphs matching a given query pattern within a sliding time window, benefits various applications such as credit card fraud detection and cyber-attack detection. Deploying such services on the cloud, however, entails severe security and privacy risks. This paper presents GraphGuard, the first system for privacy-preserving outsourcing of time-constrained pattern detection over streaming graphs. GraphGuard is constructed from a customized synergy of insights on graph modeling, lightweight secret sharing, edge differential privacy, and data encoding and padding, safeguarding the confidentiality of edge/vertex labels and the connections between vertices in the streaming graph and query patterns. We implement and evaluate GraphGuard on several real-world graph datasets. The evaluation results show that GraphGuard takes only a few seconds to securely process an encrypted query pattern over an encrypted snapshot of streaming graphs within a time window of size 50,000. Compared to a baseline built on generic secure multiparty computation, GraphGuard achieves up to 60× improvement in query latency and up to 98% savings in communication.
Blockchain I
Mempool Privacy via Batched Threshold Encryption: Attacks and Defenses
Arka Rai Choudhuri, NTT Research; Sanjam Garg, Julien Piet, and Guru-Vamsi Policharla, University of California, Berkeley
With the rising popularity of DeFi applications it is important to implement protections for regular users of these DeFi platforms against large parties with massive amounts of resources allowing them to engage in market manipulation strategies such as frontrunning/backrunning. Moreover, there are many situations (such as recovery of funds from vulnerable smart contracts) where a user may not want to reveal their transaction until it has been executed. As such, it is clear that preserving the privacy of transactions in the mempool is an important goal.
In this work we focus on achieving mempool transaction privacy through a new primitive that we term batched-threshold encryption, which is a variant of threshold encryption with strict efficiency requirements to better model the needs of resource constrained environments such as blockchains. Unlike the naive use of threshold encryption, which requires communication proportional to O(nB) to decrypt B transactions with a committee of n parties, our batched-threshold encryption scheme only needs O(n) communication. We additionally discuss pitfalls in prior approaches that use (vanilla) threshold encryption for mempool privacy.
To show that our scheme is concretely efficient, we implement our scheme and find that transactions can be encrypted in under 6 ms, independent of committee size, and the communication required to decrypt an entire batch of B transactions is 80 bytes per party, independent of the number of transactions B, making it an attractive choice when communication is very expensive. If deployed on Ethereum, which processes close to 500 transaction per block, it takes close to 2.8 s for each committee member to compute a partial decryption and under 3.5 s to decrypt all transactions for a block in single-threaded mode.
Speculative Denial-of-Service Attacks In Ethereum
Aviv Yaish, The Hebrew University; Kaihua Qin and Liyi Zhou, Imperial College London, UC Berkeley RDI; Aviv Zohar, The Hebrew University; Arthur Gervais, University College London, UC Berkeley RDI
Transaction fees compensate actors for resources expended on transactions and can only be charged from transactions included in blocks. But, the expressiveness of Turing-complete contracts implies that verifying if transactions can be included requires executing them on the current blockchain state.
In this work, we show that adversaries can craft malicious transactions that decouple the work imposed on blockchain actors from the compensation offered in return. We introduce three attacks: (i) ConditionalExhaust, a conditional resource-exhaustion attack against blockchain actors. (ii) MemPurge, an attack for evicting transactions from actors' mempools. (iii) GhostTX, an attack on the reputation system used in Ethereum's proposer-builder separation ecosystem.
We evaluate our attacks on an Ethereum testnet and find that by combining ConditionalExhaust and MemPurge, adversaries can simultaneously burden victims' computational resources and clog their mempools to the point where victims are unable to include transactions in blocks. Thus, victims create empty blocks, thereby hurting the system's liveness. The attack's expected cost is $376, but becomes cheaper if adversaries are validators. For other attackers, costs decrease if censorship is prevalent in the network.
ConditionalExhaust and MemPurge are made possible by inherent features of Turing-complete blockchains, and potential mitigations may result in reducing a ledger's scalability.
GuideEnricher: Protecting the Anonymity of Ethereum Mixing Service Users with Deep Reinforcement Learning
Ravindu De Silva, Wenbo Guo, Nicola Ruaro, Ilya Grishchenko, Christopher Kruegel, and Giovanni Vigna, University of California, Santa Barbara
Mixing services are widely employed to enhance anonymity on public blockchains. However, recent research has shown that user identities and transaction associations can be derived even with mixing services. This is mainly due to the lack of guidelines for properly using these services. In fact, mixing service developers often provide guidebooks with lists of actions that might break anonymity, and hence, should be avoided. However, such guidebooks remain incomplete, leaving users unaware of potential actions that might compromise their anonymity. This highlights the necessity for providing users with a more comprehensive guidebook. Unfortunately, existing methods for compiling anonymity compromising patterns rely on postmortem analyses, and they cannot proactively discover patterns before the mixing service is deployed.
We introduce GuideEnricher, a proactive approach for extending user guidebooks with limited human intervention. Our key novelty is a deep reinforcement learning (DRL) agent, which automatically explores patterns for transferring tokens via a mixing service. We introduce two customized designs to better guide the agent in discovering yet-unknown anonymity-compromising patterns: design proper tasks for the agent that possibly lead to compromised anonymity, and include a rule-based detector to detect the known patterns. We train the agent to finish the task while evading the detector. Using a trained agent, we conduct a second analysis step, employing clustering methods and manual inspection, to extract yet unknown patterns from the agent's actions. Through extensive evaluation, we demonstrate that GuideEnricher can train effective agents under multiple mixing services. We show that our agents facilitate the discovery of yet-unknown anonymity-compromising patterns. Furthermore, we demonstrate that GuideEnricher can continuously enrich the guidebook via an iterative update of the detector and our DRL agents.
All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts
Tianle Sun, Huazhong University of Science and Technology; Ningyu He, Peking University; Jiang Xiao, Huazhong University of Science and Technology; Yinliang Yue, Zhongguancun Laboratory; Xiapu Luo, The Hong Kong Polytechnic University; Haoyu Wang, Huazhong University of Science and Technology
In Ethereum, the practice of verifying the validity of the passed addresses is a common practice, which is a crucial step to ensure the secure execution of smart contracts. Vulnerabilities in the process of address verification can lead to great security issues, and anecdotal evidence has been reported by our community. However, this type of vulnerability has not been well studied. To fill the void, in this paper, we aim to characterize and detect this kind of emerging vulnerability. We design and implement AVVERIFIER, a lightweight taint analyzer based on static EVM opcode simulation. Its three-phase detector can progressively rule out false positives and false negatives based on the intrinsic characteristics. Upon a well-established and unbiased benchmark, AVVERIFIER can improve efficiency 2 to 5 times than the SOTA while maintaining a 94.3% precision and 100% recall. After a large-scale evaluation of over 5 million Ethereum smart contracts, we have identified 812 vulnerable smart contracts that were undisclosed by our community before this work, and 348 open source smart contracts were further verified, whose largest total value locked is over $11.2 billion. We further deploy AVVERIFIER as a real-time detector on Ethereum and Binance Smart Chain, and the results suggest that AVVERIFIER can raise timely warnings once contracts are deployed.
Using My Functions Should Follow My Checks: Understanding and Detecting Insecure OpenZeppelin Code in Smart Contracts
Han Liu, East China Normal University, Shanghai Key Laboratory of Trustworthy Computing; Daoyuan Wu, The Hong Kong University of Science and Technology; Yuqiang Sun, Nanyang Technological University; Haijun Wang, Xi'an Jiaotong University; Kaixuan Li, East China Normal University, Shanghai Key Laboratory of Trustworthy Computing; Yang Liu, Nanyang Technological University; Yixiang Chen, East China Normal University, Shanghai Key Laboratory of Trustworthy Computing
OpenZeppelin is a popular framework for building smart contracts. It provides common libraries (e.g., SafeMath), implementations of Ethereum standards (e.g., ERC20), and reusable components for access control and upgradability. However, unlike traditional software libraries, which are typically imported as static linking libraries or dynamic loading libraries, OpenZeppelin is utilized by Solidity contracts in the form of source code. As a result, developers often make custom modifications to their copies of OpenZeppelin code, which may lead to unintended security consequences.
In this paper, we conduct the first systematic study on the security of OpenZeppelin code used in real-world contracts. Specifically, we focus on the security checks in the official OpenZeppelin library and examine whether they are faithfully enforced in the relevant OpenZeppelin functions of real contracts. To this end, we propose a novel tool named ZepScope that comprises two components: MINER and CHECKER. First, MINER analyzes the official OpenZeppelin functions to extract the facts of explicit checks (i.e., the checks defined within the functions) and implicit checks (i.e., the conditions of calling the functions). Second, based on the facts extracted by MINER, CHECKER examines real contracts to identify their OpenZeppelin functions, match their checks with those in the facts, and validate the consequences for those inconsistent checks. By overcoming multiple challenges in developing ZepScope, we obtain not only the first taxonomy of OpenZeppelin checks but also the comprehensive results of checking the top 35,882 contracts from three mainstream blockchains.
ML VII: Adversarial Attack Defense
Correction-based Defense Against Adversarial Video Attacks via Discretization-Enhanced Video Compressive Sensing
Wei Song, Cong Cong, Haonan Zhong, and Jingling Xue, UNSW Sydney
We introduce SECVID, a correction-based framework that defends video recognition systems against adversarial attacks without prior adversarial knowledge. It uses discretization-enhanced video compressive sensing in a black-box preprocessing module, transforming videos into a sparse domain to disperse and neutralize perturbations. While SECVID's discretized compression disrupts perturbation continuity, its reconstruction process minimizes adversarial elements, causing only minor distortions to the original videos. Though not completely restoring adversarial videos, SECVID significantly enhances their quality, enabling accurate classification by SECVID-enhanced video classifiers and preventing adversarial attacks. Tested on C3D and I3D with the UCF-101 and HMDB-51 datasets against five types of advanced video attacks, SECVID outperforms existing defenses, improving detection accuracy by 38.5% to 866.2%. Specifically designed for high-risk environments, SECVID addresses trade-offs like minor accuracy reduction, additional pre-processing training, and longer inference times, with potential optimization through selective security impacting strategies.
Rethinking the Invisible Protection against Unauthorized Image Usage in Stable Diffusion
Shengwei An, Lu Yan, Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Qiuling Xu, Guanhong Tao, and Xiangyu Zhang, Purdue University
Advancements in generative AI models like Stable Diffusion, DALL·E 2, and Midjourney have revolutionized digital creativity, enabling the generation of authentic-looking images from text and altering existing images with ease. Yet, their capacity poses significant ethical challenges, including replicating an artist's style without consent, the creation of counterfeit images, and potential reputational damage through manipulated content. Protection techniques have emerged to combat misuse by injecting imperceptible noises into images. This paper introduces Insight, a novel approach that challenges the robustness of these protections by aligning protected image features with human visual perception. By using a photo as a reference, approximating the human eye's perspective, Insight effectively neutralizes protective perturbations, enabling the generative model to recapture authentic features. Our extensive evaluation across 3 datasets and 10 protection techniques demonstrates its superiority over existing methods in overcoming protective measures, emphasizing the need for stronger safeguards in digital content generation.
Splitting the Difference on Adversarial Training
Matan Levi and Aryeh Kontorovich, Ben-Gurion University of the Negev
The existence of adversarial examples points to a basic weakness of deep neural networks. One of the most effective defenses against such examples, adversarial training, entails training models with some degree of robustness, usually at the expense of a degraded natural accuracy. Most adversarial training methods aim to learn a model that finds, for each class, a common decision boundary encompassing both the clean and perturbed examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned, effectively splitting each class into two classes: "clean" and "adversarial." This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries. We provide a theoretical plausibility argument that sheds some light on the conditions under which our approach can be expected to be beneficial. Likewise, we empirically demonstrate that our method learns robust models while attaining optimal or near-optimal natural accuracy, e.g., on CIFAR-10 we obtain near-optimal natural accuracy of 95.01% alongside significant robustness across multiple tasks. The ability to achieve such near-optimal natural accuracy, while maintaining a significant level of robustness, makes our method applicable to real-world applications where natural accuracy is at a premium. As a whole, our main contribution is a general method that confers a significant level of robustness upon classifiers with only minor or negligible degradation of their natural accuracy.
Machine Learning needs Better Randomness Standards: Randomised Smoothing and PRNG-based attacks
Pranav Dahiya, University of Cambridge; Ilia Shumailov, University of Oxford; Ross Anderson, University of Cambridge & University of Edinburgh
Randomness supports many critical functions in the field of machine learning (ML) including optimisation, data selection, privacy, and security. ML systems outsource the task of generating or harvesting randomness to the compiler, the cloud service provider or elsewhere in the toolchain. Yet there is a long history of attackers exploiting poor randomness, or even creating it—as when the NSA put backdoors in random number generators to break cryptography. In this paper we consider whether attackers can compromise an ML system using only the randomness on which they commonly rely. We focus our effort on Randomised Smoothing, a popular approach to train certifiably robust models, and to certify specific input datapoints of an arbitrary model. We choose Randomised Smoothing since it is used for both security and safety—to counteract adversarial examples and quantify uncertainty respectively. Under the hood, it relies on sampling Gaussian noise to explore the volume around a data point to certify that a model is not vulnerable to adversarial examples. We demonstrate an entirely novel attack, where an attacker backdoors the supplied randomness to falsely certify either an overestimate or an underestimate of robustness for up to 81 times. We demonstrate that such attacks are possible, that they require very small changes to randomness to succeed, and that they are hard to detect. As an example, we hide an attack in the random number generator and show that the randomness tests suggested by NIST fail to detect it. We advocate updating the NIST guidelines on random number testing to make them more appropriate for safety-critical and security-critical machine-learning applications.
PatchCURE: Improving Certifiable Robustness, Model Utility, and Computation Efficiency of Adversarial Patch Defenses
Chong Xiang, Tong Wu, and Sihui Dai, Princeton University; Jonathan Petit, Qualcomm Technologies, Inc.; Suman Jana, Columbia University; Prateek Mittal, Princeton University
State-of-the-art defenses against adversarial patch attacks can now achieve strong certifiable robustness with a marginal drop in model utility. However, this impressive performance typically comes at the cost of 10-100x more inference-time computation compared to undefended models — the research community has witnessed an intense three-way trade-off between certifiable robustness, model utility, and computation efficiency. In this paper, we propose a defense framework named PatchCURE to approach this trade-off problem. PatchCURE provides sufficient "knobs" for tuning defense performance and allows us to build a family of defenses: the most robust PatchCURE instance can match the performance of any existing state-of-the-art defense (without efficiency considerations); the most efficient PatchCURE instance has similar inference efficiency as undefended models. Notably, PatchCURE achieves state-of-the-art robustness and utility performance across all different efficiency levels, e.g., 16-23% absolute clean accuracy and certified robust accuracy advantages over prior defenses when requiring computation efficiency to be close to undefended models. The family of PatchCURE defenses enables us to flexibly choose appropriate defenses to satisfy given computation and/or utility constraints in practice.
Language-Based Security
GHunter: Universal Prototype Pollution Gadgets in JavaScript Runtimes
Eric Cornelissen, Mikhail Shcherbakov, and Musard Balliu, KTH Royal Institute of Technology
Prototype pollution is a recent vulnerability that affects JavaScript code, leading to high impact attacks such as arbitrary code execution and privilege escalation. The vulnerability is rooted in JavaScript's prototype-based inheritance, enabling attackers to inject arbitrary properties into an object's prototype at runtime. The impact of prototype pollution depends on the existence of otherwise benign pieces of code (gadgets), which inadvertently read from these attacker-controlled properties to execute security-sensitive operations. While prior works primarily study gadgets in third-party libraries and client-side applications, gadgets in JavaScript runtime environments are arguably more impactful as they affect any application that executes on these runtimes.
In this paper we design, implement, and evaluate a pipeline, GHunter, to systematically detect gadgets in V8-based JavaScript runtimes with prime focus on Node.js and Deno. GHunter supports a lightweight dynamic taint analysis to automatically identify gadget candidates which we validate manually to derive proof-of-concept exploits. We implement GHunter by modifying the V8 engine and the targeted runtimes along with features for facilitating manual validation. Driven by the comprehensive test suites of Node.js and Deno, we use GHunter in a systematic study of gadgets in these runtimes. We identified a total of 56 new gadgets in Node.js and 67 gadgets in Deno, pertaining to vulnerabilities such as arbitrary code execution (19), privilege escalation (31), path traversal (13), and more. Moreover, we systematize, for the first time, existing mitigations for prototype pollution and gadgets in terms of development guidelines. We collect a list of vulnerable applications and revisit the fixes through the lens of our guidelines. Through this exercise, we also identified one high-severity CVE leading to remote code execution, which was due to incorrectly fixing a gadget.
MetaSafe: Compiling for Protecting Smart Pointer Metadata to Ensure Safe Rust Integrity
Martin Kayondo and Inyoung Bang, Seoul National University; Yeongjun Kwak and Hyungon Moon, UNIST; Yunheung Paek, Seoul National University
Rust is a programming language designed with a focus on memory safety. It introduces new concepts such as ownership and performs static bounds checks at compile time to ensure spatial and temporal memory safety. For memory operations or data types whose safety the compiler cannot prove at compile time, Rust either explicitly excludes such portions of the program, termed unsafe Rust, from static analysis, or it relies on runtime enforcement using smart pointers. Existing studies have shown that potential memory safety bugs in such unsafe Rust can bring down the entire program, proposing in-process isolation or compartmentalization as a remedy. However, in this study, we show that the safe Rust remains susceptible to memory safety bugs even with the proposed isolation applied. The smart pointers upon which safe Rust's memory safety is built rely on metadata often stored alongside program data, possibly within reach of attackers. Manipulating this metadata, an attacker can nullify safe Rust's memory safety checks dependent on it, causing memory access bugs and exploitation. In response to this issue, we propose MetaSafe, a mechanism that safeguards smart pointer metadata from such attacks. MetaSafe stores smart pointer metadata in a gated memory region where only a predefined set of metadata management functions can write, ensuring that each smart pointer update does not cause safe Rust's memory safety violation. We have implemented MetaSafe by extending the official Rust compiler and evaluated it with a variety of micro- and application benchmarks. The overhead of MetaSafe is found to be low; it incurs a 3.5% average overhead on the execution time of a web browser benchmarks.
The overhead of MetaSafe is found to be low; it incurs a 3.5% average overhead on the execution time of a web browser benchmarks.RustSan: Retrofitting AddressSanitizer for Efficient Sanitization of Rust
Kyuwon Cho, Jongyoon Kim, Kha Dinh Duy, Hajeong Lim, and Hojoon Lee, Sungkyunkwan University
Rust is gaining traction as a safe systems programming language with its strong type and memory safety guarantees. However, Rust's guarantees are not infallible. The use of unsafe Rust, a subvariant of Rust, allows the programmer to temporarily escape the strict Rust language semantics to trade security for flexibility. Memory errors within unsafe blocks in Rust have far-reaching ramifications for the program's safety. As a result, the conventional dynamic memory error detection (e.g., fuzzing) has been adapted as a common practice for Rust and proved its effectiveness through a trophy case of discovered CVEs.
RUSTSAN is a retrofitted design of AddressSanitizer (ASan) for efficient dynamic memory error detection of Rust programs. Our observation is that a significant portion of instrumented memory access sites in a Rust program compiled with ASan is redundant, as the Rust security guarantees can still be valid at the site. RUSTSAN identifies and instruments the sites that definitely or may undermine Rust security guarantees while lifting instrumentation on safe sites. To this end, RUSTSAN employs a cross-IR program analysis for accurate tracking of unsafe sites and also extends ASan's shadow memory scheme for checking non-uniform memory access validation necessary for Rust. We conduct a comprehensive evaluation of RUSTSAN in terms of detection capability and performance using 57 Rust crates. RUSTSAN successfully detected all 31 tested cases of CVE-issued memory errors. Also, RUSTSAN shows an average of 62.3% performance increase against ASan in general benchmarks that involved 20 Rust crates. In the fuzzing experiment with 6 crates, RUSTSAN marked an average of 23.52%, and up to 57.08% of performance improvement.
FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques
Nikolaos Pantelaios and Alexandros Kapravelos, North Carolina State University
Evasion techniques allow malicious code to never be observed. This impacts significantly the detection capabilities of tools that rely on either dynamic or static analysis, as they never get to process the malicious code. The dynamic nature of JavaScript, where code is often injected dynamically, makes evasions particularly effective. Yet, we lack tools that can detect evasive techniques in a challenging environment such as JavaScript.
In this paper, we present FV8, a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code. FV8 selectively enforces code execution on APIs that conditionally inject dynamic code, thus enhancing code coverage and consequently improving visibility into malicious code. We integrate our tool in both the Node.js engine and the Chromium browser, compelling code execution in npm packages and Chrome browser extensions. Our tool increases code coverage by 11% compared to default V8 and detects 28 unique evasion categories, including five previously unreported techniques. In data confirmed as malicious from both ecosystems, our tool identifies 1,443 (14.6%) npm packages and 164 (82%) extensions containing at least one type of evasion. In previously unexamined extensions (39,592), our tool discovered 16,471 injected third-party scripts, and a total of 8,732,120 lines of code executed due to our forced execution instrumentation. Furthermore, it tagged a total of 423 extensions as both evasive and malicious and we manually verify 110 extensions (26%) to actually be malicious, impacting two million users. Our tool is open-source and serves both as an in-browser and standalone dynamic analysis tool, capable of detecting evasive code, bypassing obfuscation in certain cases, offering improved access to malicious code, and supporting recursive analysis of dynamic code injections.
DONAPI: Malicious NPM Packages Detector using Behavior Sequence Knowledge Mapping
Cheng Huang, Nannan Wang, Ziyan Wang, Siqi Sun, Lingzi Li, Junren Chen, Qianchong Zhao, Jiaxuan Han, and Zhen Yang, Sichuan University; Lei Shi, Huawei Technologies
With the growing popularity of modularity in software development comes the rise of package managers and language ecosystems. Among them, npm stands out as the most extensive package manager, hosting more than 2 million third-party open-source packages that greatly simplify the process of building code. However, this openness also brings security risks, as evidenced by numerous package poisoning incidents.
In this paper, we synchronize a local package cache containing more than 3.4 million packages in near real-time to give us access to more package code details. Further, we perform manual inspection and API call sequence analysis on packages collected from public datasets and security reports to build a hierarchical classification framework and behavioral knowledge base covering different sensitive behaviors. In addition, we propose the DONAPI, an automatic malicious npm packages detector that combines static and dynamic analysis. It makes preliminary judgments on the degree of maliciousness of packages by code reconstruction techniques and static analysis, extracts dynamic API call sequences to confirm and identify obfuscated content that static analysis can not handle alone, and finally tags malicious software packages based on the constructed behavior knowledge base. To date, we have identified and manually confirmed 325 malicious samples and discovered 2 unusual API calls and 246 API call sequences that have not appeared in known samples.
Zero-Knowledge Proof II
Election Eligibility with OpenID: Turning Authentication into Transferable Proof of Eligibility
Véronique Cortier, Alexandre Debant, Anselme Goetschmann, and Lucca Hirschi, Université de Lorraine, CNRS, Inria, LORIA, France
Eligibility checks are often abstracted away or omitted in voting protocols, leading to situations where the voting server can easily stuff the ballot box. One reason for this is the difficulty of bootstraping the authentication material for voters without relying on trusting the voting server.
In this paper, we propose a new protocol that solves this problem by building on OpenID, a widely deployed authentication protocol. Instead of using it as a standard authentication means, we turn it into a mechanism that delivers transferable proofs of eligibility. Using zk-SNARK proofs, we show that this can be done without revealing any compromising information, in particular, protecting everlasting privacy. Our approach remains efficient and can easily be integrated into existing protocols, as we have done for the Belenios voting protocol. We provide a full-fledged proof of concept along with benchmarks showing our protocol could be realistically used in large-scale elections.
Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Sebastian Angel, Eleftherios Ioannidis, and Elizabeth Margolin, University of Pennsylvania; Srinath Setty, Microsoft Research; Jess Woods, University of Pennsylvania
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Scalable Zero-knowledge Proofs for Non-linear Functions in Machine Learning
Meng Hao, Hanxiao Chen, and Hongwei Li, School of Computer Science and Engineering, University of Electronic Science and Technology of China; Chenkai Weng, Northwestern University; Yuan Zhang and Haomiao Yang, School of Computer Science and Engineering, University of Electronic Science and Technology of China; Tianwei Zhang, Nanyang Technological University
Zero-knowledge (ZK) proofs have been recently explored for the integrity of machine learning (ML) inference. However, these protocols suffer from high computational overhead, with the primary bottleneck stemming from the evaluation of non-linear functions. In this paper, we propose the first systematic ZK proof framework for non-linear mathematical functions in ML using the perspective of table lookup. The key challenge is that table lookup cannot be directly applied to non-linear functions in ML since it would suffer from inefficiencies due to the intolerably large table. Therefore, we carefully design several important building blocks, including digital decomposition, comparison, and truncation, such that they can effectively utilize table lookup with a quite small table size while ensuring the soundness of proofs. Based on these building blocks, we implement complex mathematical operations and further construct ZK proofs for current mainstream non-linear functions in ML such as ReLU, sigmoid, and normalization. The extensive experimental evaluation shows that our framework achieves 50∼179× runtime improvement compared to the state-of-the-art work, while maintaining a similar level of communication efficiency.
ZKSMT: A VM for Proving SMT Theorems in Zero Knowledge
Daniel Luick, John C. Kolesar, and Timos Antonopoulos, Yale University; William R. Harris and James Parker, Galois, Inc.; Ruzica Piskac, Yale University; Eran Tromer, Boston University; Xiao Wang and Ning Luo, Northwestern University
Verification of program safety is often reducible to proving the unsatisfiability (i.e., validity) of a formula in Satisfiability Modulo Theories (SMT): Boolean logic combined with theories that formalize arbitrary first-order fragments. Zero-knowledge (ZK) proofs allow SMT formulas to be validated without revealing the underlying formulas or their proofs to other parties, which is a crucial building block for proving the safety of proprietary programs. Recently, Luo et al. (CCS 2022) studied the simpler problem of proving the unsatisfiability of pure Boolean formulas but does not support proofs generated by SMT solvers. This work presents ZKSMT, a novel framework for proving the validity of SMT formulas in ZK. We design a virtual machine (VM) tailored to efficiently represent the verification process of SMT validity proofs in ZK. Our VM can support the vast majority of popular theories when proving program safety while being complete and sound. To demonstrate this, we instantiate the commonly used theories of equality and linear integer arithmetic in our VM with theory-specific optimizations for proving them in ZK. ZKSMT achieves high practicality even when running on realistic SMT formulas generated by Boogie, a common tool for software verification. It achieves a three-order-of-magnitude improvement compared to a baseline that executes the proof verification code in a general ZK system.
SoK: What Don't We Know? Understanding Security Vulnerabilities in SNARKs
Stefanos Chaliasos, Imperial College London; Jens Ernstberger, Technical University of Munich; David Theodore, Ethereum Foundation; David Wong, zkSecurity; Mohammad Jahanara, Scroll Foundation; Benjamin Livshits, Imperial College London & Matter Labs
Zero-knowledge proofs (ZKPs) have evolved from being a theoretical concept providing privacy and verifiability to having practical, real-world implementations, with SNARKs (Succinct Non-Interactive Argument of Knowledge) emerging as one of the most significant innovations. Prior work has mainly focused on designing more efficient SNARK systems and providing security proofs for them. Many think of SNARKs as "just math," implying that what is proven to be correct and secure is correct in practice. In contrast, this paper focuses on assessing end-to-end security properties of real-life SNARK implementations. We start by building foundations with a system model and by establishing threat models and defining adversarial roles for systems that use SNARKs. Our study encompasses an extensive analysis of 141 actual vulnerabilities in SNARK implementations, providing a detailed taxonomy to aid developers and security researchers in understanding the security threats in systems employing SNARKs. Finally, we evaluate existing defense mechanisms and offer recommendations for enhancing the security of SNARK-based systems, paving the way for more robust and reliable implementations in the future.
2:45 pm–3:15 pm
Coffee and Tea Break
Grand Ballroom Foyer
3:15 pm–4:15 pm
Measurement III: Auditing and Best Practices I
SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models
Boyang Zhang, Zheng Li, Ziqing Yang, Xinlei He, Michael Backes, Mario Fritz, and Yang Zhang, CISPA Helmholtz Center for Information Security
While advanced machine learning (ML) models are deployed in numerous real-world applications, previous works demonstrate these models have security and privacy vulnerabilities. Various empirical research has been done in this field. However, most of the experiments are performed on target ML models trained by the security researchers themselves. Due to the high computational resource requirement for training advanced models with complex architectures, researchers generally choose to train a few target models using relatively simple architectures on typical experiment datasets. We argue that to understand ML models' vulnerabilities comprehensively, experiments should be performed on a large set of models trained with various purposes (not just the purpose of evaluating ML attacks and defenses). To this end, we propose using publicly available models with weights from the Internet (public models) for evaluating attacks and defenses on ML models. We establish a database, namely SecurityNet, containing 910 annotated image classification models. We then analyze the effectiveness of several representative attacks/defenses, including model stealing attacks, membership inference attacks, and backdoor detection on these public models. Our evaluation empirically shows the performance of these attacks/defenses can vary significantly on public models compared to self-trained models. We share SecurityNet with the research community and advocate researchers to perform experiments on public models to better demonstrate their proposed methods' effectiveness in the future.
How does Endpoint Detection use the MITRE ATT&CK Framework?
Apurva Virkud, Muhammad Adil Inam, Andy Riddle, Jason Liu, Gang Wang, and Adam Bates, University of Illinois Urbana-Champaign
MITRE ATT&CK is an open-source taxonomy of adversary tactics, techniques, and procedures based on real-world observations. Increasingly, organizations leverage ATT&CK technique "coverage" as the basis for evaluating their security posture, while Endpoint Detection and Response (EDR) and Security Indicator and Event Management (SIEM) products integrate ATT&CK into their design as well as marketing. However, the extent to which ATT&CK coverage is suitable to serve as a security metric remains unclear— Does ATT&CK coverage vary meaningfully across different products? Is it possible to achieve total coverage of ATT&CK? Do endpoint products that detect the same attack behaviors even claim to cover the same ATT&CK techniques?
In this work, we attempt to answer these questions by conducting a comprehensive (and, to our knowledge, the first) analysis of endpoint detection products' use of MITRE ATT&CK. We begin by evaluating 3 ATT&CK-annotated detection rulesets from major commercial providers (Carbon Black, Splunk, Elastic) and a crowdsourced ruleset (Sigma) to identify commonalities and underutilized regions of the ATT&CK matrix. We continue by performing a qualitative analysis of unimplemented ATT&CK techniques to determine their feasibility as detection rules. Finally, we perform a consistency analysis of ATT&CK labeling by examining 37 specific threat entities for which at least 2 products include specific detection rules. Combined, our findings highlight the limitations of overdepending on ATT&CK coverage when evaluating security posture; most notably, many techniques are unrealizable as detection rules, and coverage of an ATT&CK technique does not consistently imply coverage of the same real-world threats.
Digital Discrimination of Users in Sanctioned States: The Case of the Cuba Embargo
Anna Ablove, Shreyas Chandrashekaran, Hieu Le, Ram Sundara Raman, and Reethika Ramesh, University of Michigan; Harry Oppenheimer, Georgia Institute of Technology; Roya Ensafi, University of Michigan
Distinguished Paper Award Winner
We present one of the first in-depth and systematic end-user centered investigations into the effects of sanctions on geoblocking, specifically in the case of Cuba. We conduct network measurements on the Tranco Top 10K domains and complement our findings with a small-scale user study with a questionnaire. We identify 546 domains subject to geoblocking across all layers of the network stack, ranging from DNS failures to HTTP(S) response pages with a variety of status codes. Through this work, we discover a lack of user-facing transparency; we find 88% of geoblocked domains do not serve informative notice of why they are blocked. Further, we highlight a lack of measurement-level transparency, even among HTTP(S) blockpage responses. Notably, we identify 32 instances of blockpage responses served with 200 OK status codes, despite not returning the requested content. Finally, we note the inefficacy of current improvement strategies and make recommendations to both service providers and policymakers to reduce Internet fragmentation.
A Broad Comparative Evaluation of Software Debloating Tools
Michael D. Brown and Adam Meily, Trail of Bits; Brian Fairservice, Akshay Sood, and Jonathan Dorn, GrammaTech; Eric Kilmer and Ronald Eytchison, Trail of Bits
Software debloating tools seek to improve program security and performance by removing unnecessary code, called bloat. While many techniques have been proposed, several barriers to their adoption have emerged. Namely, debloating tools are highly specialized, making it difficult for adopters to find the right type of tool for their needs. This is further hindered by a lack of established metrics and comparative evaluations between tools. To close this information gap, we surveyed 10 years of debloating literature and several tools currently under commercial development to taxonomize knowledge about the debloating ecosystem. We then conducted a broad comparative evaluation of 10 debloating tools to determine their relative strengths and weaknesses. Our evaluation, conducted on a diverse set of 20 benchmark programs, measures tools across 12 performance, security, and correctness metrics.
Our evaluation surfaces several concerning findings that contradict the prevailing narrative in the debloating literature. First, debloating tools lack the maturity required to be used on real-world software, evidenced by a slim 22% overall success rate for creating passable debloated versions of medium- and high-complexity benchmarks. Second, debloating tools struggle to produce sound and robust programs. Using our novel differential fuzzing tool, DIFFER, we discovered that only 13% of our debloating attempts produced a sound and robust debloated program. Finally, our results indicate that debloating tools typically do not improve the performance or security posture of debloated programs by a significant degree according to our evaluation metrics. We believe that our contributions in this paper will help potential adopters better understand the landscape of tools and will motivate future research and development of more capable debloating tools. To this end, we have made our benchmark set, data, and custom tools publicly available.
Hardware Security III: Signals
LaserAdv: Laser Adversarial Attacks on Speech Recognition Systems
Guoming Zhang, Xiaohui Ma, Huiting Zhang, and Zhijie Xiang, Shandong University; Xiaoyu Ji, Zhejiang University; Yanni Yang, Xiuzhen Cheng, and Pengfei Hu, Shandong University
Audio adversarial perturbations are imperceptible to humans but can mislead machine learning models, posing a security threat to automatic speech recognition (ASR) systems. Existing methods aim to minimize perturbation values, use acoustic masking, or mimic environmental sounds to render them undetectable. However, these perturbations, being audible frequency range sounds, are still audibly detectable. The slow propagation and rapid attenuation of sound limit their temporal sensitivity and attack range. In this study, we propose LaserAdv, a method that employs lasers to launch adversarial attacks, thereby overcoming the aforementioned challenges due to the superior properties of lasers. In the presence of victim speech, laser adversarial perturbations are superimposed on the speech rather than simply drowning it out, so LaserAdv has higher attack efficiency and longer attack range than LightCommands. LaserAdv introduces a selective amplitude enhancement method based on time-frequency interconversion (SAE-TFI) to deal with distortion. Meanwhile, to simultaneously achieve inaudible, targeted, universal, synchronization-free (over 0.5 s), long-range, and black-box attacks in the physical world, we introduced a series of strategies into the objective function. Our experimental results show that a single perturbation can cause DeepSpeech, Whisper and iFlytek, to misinterpret any of the 12,260 voice commands as the target command with accuracy of up to 100%, 92% and 88%, respectively. The attack distance can be up to 120 m.
MicGuard: A Comprehensive Detection System against Out-of-band Injection Attacks for Different Level Microphone-based Devices
Tiantian Liu, Feng Lin, Zhongjie Ba, Li Lu, Zhan Qin, and Kui Ren, Zhejiang University and Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
The integration of microphones into sensors and systems, serving as input interfaces to intelligent applications and industrial manufacture, has raised growing public concerns regarding their input perception. Studies have uncovered the threat of out-of-band injection attacks on microphones, encompassing ultrasound, laser, and electromagnetic attacks, injecting commands or interferences for malicious intentions. However, existing efforts are limited to defense against ultrasound injections, overlooking the risks posed by other out-of-band injections. To address this gap, this paper proposes MicGuard, a comprehensive passive detection system against out-of-band attacks. Without relying on prior information from attacking and victim devices, the key insight of MicGuard is to utilize carrier traces and spectral chaos observed by remaining injection phenomena across different levels of devices. The carrier traces are used in a prejudgment to fast reject partial injected signals, and the following memory-based detection model to distinguish anomaly based on the quantified chaotic entropy extracted from publicly available audio datasets. MicGuard is evaluated on a wide range of microphone-based devices including sensors, recorders, smartphones, and tablets, achieving an average AUC of 98% with high robustness and universality.
VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger
Zihao Zhan and Yirui Yang, University of Florida; Haoqi Shan, University of Florida, CertiK; Hanqiu Wang, Yier Jin, and Shuo Wang, University of Florida
Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vectors, enabling adversaries to manipulate the charger and perform a series of attacks.
In this paper, we propose VoltSchemer, a set of innovative attacks that grant attackers control over commercial-off-the-shelf wireless chargers merely by modulating the voltage from the power supply. These attacks represent the first of its kind, exploiting voltage noises from the power supply to manipulate wireless chargers without necessitating any malicious modifications to the chargers themselves. The significant threats imposed by VoltSchemer are substantiated by three practical attacks, where a charger can be manipulated to: control voice assistants via inaudible voice commands, damage devices being charged through overcharging or overheating, and bypass Qi-standard specified foreign-object-detection mechanism to damage valuable items exposed to intense magnetic fields.
We demonstrate the effectiveness and practicality of the VoltSchemer attacks with successful attacks on 9 top-selling COTS wireless chargers. Furthermore, we discuss the security implications of our findings and suggest possible countermeasures to mitigate potential threats.
VibSpeech: Exploring Practical Wideband Eavesdropping via Bandlimited Signal of Vibration-based Side Channel
Chao Wang, Feng Lin, Hao Yan, and Tong Wu, Zhejiang University; Wenyao Xu, University at Buffalo, the State University of New York; Kui Ren, Zhejiang University
Vibration-based side channel is an ever-present threat to speech privacy. However, due to the target's frequency response with a rapid decay or limited sampling rate of malicious sensors, the acquired vibration signals are often distorted and narrowband, which fails an intelligible speech recovery. This paper tries to answer that when the side-channel data has only a very limited bandwidth (<500Hz), is it feasible to achieve a wideband eavesdropping based on a practical assumption? Our answer is YES based on the assumption that a short utterance (2s-4s) of the victim is exposed to the attacker. What is most surprising is that the attack can recover speech with a bandwidth of up to 8kHz. This covers almost all phonemes (voiced and unvoiced) in human speech and causes practical threat. The core idea of the attack is using vocal-tract features extracted from the victim's utterance to compensate for the side-channel data. To demonstrate the threat, we proposed a vocal-guided attack scheme called VibSpeech and built a prototype based on a mmWave sensor to penetrate soundproof walls for vibration sensing. We solved challenges of vibration artifact suppression and a generalized scheme free of any target's training data. We evaluated VibSpeech with extensive experiments and validated it on the IMU-based method. The results indicated that VibSpeech can recover intelligible speech with an average MCD/SNR of 3.9/5.4dB.
System Security III: Memory I
CAMP: Compiler and Allocator-based Heap Memory Protection
Zhenpeng Lin, Zheng Yu, Ziyi Guo, Simone Campanoni, Peter Dinda, and Xinyu Xing, Northwestern University
The heap is a critical and widely used component of many applications. Due to its dynamic nature, combined with the complexity of heap management algorithms, it is also a frequent target for security exploits. To enhance the heap's security, various heap protection techniques have been introduced, but they either introduce significant runtime overhead or have limited protection. We present CAMP, a new sanitizer for detecting and capturing heap memory corruption. CAMP leverages a compiler and a customized memory allocator. The compiler adds boundary-checking and escape-tracking instructions to the target program, while the memory allocator tracks memory ranges, coordinates with the instrumentation, and neutralizes dangling pointers. With the novel error detection scheme, CAMP enables various compiler optimization strategies and thus eliminates redundant and unnecessary check instrumentation. This design minimizes runtime overhead without sacrificing security guarantees. Our evaluation and comparison of CAMP with existing tools, using both real-world applications and SPEC CPU benchmarks, show that it provides even better heap corruption detection capability with lower runtime overhead.
GPU Memory Exploitation for Fun and Profit
Yanan Guo, University of Rochester; Zhenkai Zhang, Clemson University; Jun Yang, University of Pittsburgh
As modern applications increasingly rely on GPUs to accelerate the computation, it has become very critical to study and understand the security implications of GPUs. In this work, we conduct a thorough examination of buffer overflows on modern GPUs. Specifically, we demonstrate that, due to GPU's unique memory system, GPU programs suffer from different and more complex buffer overflow vulnerabilities compared to CPU programs, contradicting the conclusions of prior studies. In addition, despite the critical role GPUs play in modern computing, GPU systems are missing essential memory protection mechanisms. Consequently, when buffer overflow vulnerabilities are exploited by an attacker, they can lead to both code injection attacks and code reuse attacks, including return-oriented programming (ROP). Our results show that these attacks pose a significant security risk to modern GPU applications.
SLUBStick: Arbitrary Memory Writes through Practical Software Cross-Cache Attacks within the Linux Kernel
Lukas Maar, Stefan Gast, Martin Unterguggenberger, Mathias Oberhuber, and Stefan Mangard, Graz University of Technology
While the number of vulnerabilities in the Linux kernel has increased significantly in recent years, most have limited capabilities, such as corrupting a few bytes in restricted allocator caches. To elevate their capabilities, security researchers have proposed software cross-cache attacks, exploiting the memory reuse of the kernel allocator. However, such cross-cache attacks are impractical due to their low success rate of only 40 %, with failure scenarios often resulting in a system crash.
In this paper, we present SLUBStick, a novel kernel exploitation technique elevating a limited heap vulnerability to an arbitrary memory read-and-write primitive. SLUBStick operates in multiple stages: Initially, it exploits a timing side channel of the allocator to perform a cross-cache attack reliably. Concretely, exploiting the side-channel leakage pushes the success rate to above 99 % for frequently used generic caches. SLUBStick then exploits code patterns prevalent in the Linux kernel to convert a limited heap vulnerability into a page table manipulation, thereby granting the capability to read and write memory arbitrarily. We demonstrate the applicability of SLUBStick by systematically analyzing two Linux kernel versions, v5.19 and v6.2. Lastly, we evaluate SLUBStick with a synthetic vulnerability and 9 real-world CVEs, showcasing privilege escalation and container escape in the Linux kernel with state-of-the-art kernel defenses enabled.
Detecting Kernel Memory Bugs through Inconsistent Memory Management Intention Inferences
Dinghao Liu, Zhipeng Lu, and Shouling Ji, Zhejiang University; Kangjie Lu, University of Minnesota; Jianhai Chen and Zhenguang Liu, Zhejiang University; Dexin Liu, Peking University; Renyi Cai, Alibaba Cloud Computing Co., Ltd; Qinming He, Zhejiang University
Modern operating system kernels, typically written in low-level languages such as C and C++, are tasked with managing extensive memory resources. Memory-related errors, such as memory leak and memory corruption, are common occurrences and constantly introduced. Traditional detection methods often rely on taint analysis, which suffers from scalability issue (i.e., path explosion) when applied to complex OS kernels. Recent research has pivoted towards leveraging techniques like function pairing or similarity analysis to overcome this challenge. These approaches identify memory errors by referencing code that is either frequently used or semantically similar. However, these techniques have limitations when applied to customized code, which may lack a sufficient corpus of code snippets to facilitate effective function pairing or similarity analysis. This deficiency hinders their applicability in kernel analysis where unique or proprietary code is prevalent.
In this paper, we propose a novel methodology for detecting memory bugs based on inconsistent memory management intentions (IMMI). Our insight is that many memory bugs, despite their varied manifestations, stem from a common underlying issue: the ambiguity in ownership and lifecycle management of memory objects, especially when these objects are passed across various functions. Memory bugs emerge when the mem- ory management strategies of the caller and callee functions misalign for a given memory object. IMMI aims to model and clarify these inconsistent intentions, thereby mitigating the prevalence of such bugs. Our methodology offers two primary advantages over existing techniques: (1) It utilizes a fine-grained memory management model that obviates the need for extensive data-flow tracking, and (2) it does not rely on similarity analysis or the identification of function pairs, making it highly effective in the context of customized code. To enhance the capabilities of IMMI, we have integrated a large language model (LLM) to assist in the interpretation of implicit kernel resource management mechanisms. We have implemented IMMI and evaluated it against the Linux kernel. IMMI effectively found 80 new memory bugs (including 23 memory corruptions and 57 memory leaks) with 35% false positive rate. Most of them are missed by the state-of-the-art memory bug detection tools.
Web Security II: Privacy
Near-Optimal Constrained Padding for Object Retrievals with Dependencies
Pranay Jain, Duke University; Andrew C. Reed, United States Military Academy; Michael K. Reiter, Duke University
The sizes of objects retrieved over the network are powerful indicators of the objects retrieved and are ingredients in numerous types of traffic analysis, such as webpage fingerprinting. We present an algorithm by which a benevolent object store computes a memoryless padding scheme to pad objects before sending them, in a way that bounds the information gain that the padded sizes provide to the network observer about the objects being retrieved. Moreover, our algorithm innovates over previous works in two critical ways. First, the computed padding scheme satisfies constraints on the padding overhead: no object is padded to more than c x its original size, for a tunable factor c > 1. Second, the privacy guarantees of the padding scheme allow for object retrievals that are not independent, as could be caused by hyperlinking. We show in empirical tests that our padding schemes improve dramatically over previous schemes for padding dependent object retrievals, providing better privacy at substantially lower padding overhead, and over known techniques for padding independent object retrievals subject to padding overhead constraints.
PURL: Safe and Effective Sanitization of Link Decoration
Shaoor Munir and Patrick Lee, University of California, Davis; Umar Iqbal, Washington University in St. Louis; Zubair Shafiq, University of California, Davis; Sandra Siby, Imperial College London
While privacy-focused browsers have taken steps to block third-party cookies and mitigate browser fingerprinting, novel tracking techniques that can bypass existing countermeasures continue to emerge. Since trackers need to share information from the client-side to the server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect and sanitize tracking information in decorated links. To this end, we present PURL (pronounced purel-l), a machine learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques. PURL's deployment on a sample of top-million websites shows that link decoration is abused for tracking on nearly three-quarters of the websites, often to share cookies, email addresses, and fingerprinting information.
Fledging Will Continue Until Privacy Improves: Empirical Analysis of Google's Privacy-Preserving Targeted Advertising
Giuseppe Calderonio, Mir Masood Ali, and Jason Polakis, University of Illinois Chicago
Google recently announced plans to phase out third-party cookies and is currently in the process of rolling out the Chrome Privacy Sandbox, a collection of APIs and web standards that offer privacy-preserving alternatives to existing technologies, particularly for the digital advertising ecosystem. This includes FLEDGE, also referred to as the Protected Audience, which provides the necessary mechanisms for effectively conducting real-time bidding and ad auctions directly within users' browsers. FLEDGE is designed to eliminate the invasive data collection and pervasive tracking practices used for remarketing and targeted advertising. In this paper, we provide a study of the FLEDGE ecosystem both before and after its official deployment in Chrome. We find that even though multiple prominent ad platforms have entered the space, Google ran 99.8% of the auctions we observed, highlighting its dominant role. Subsequently, we provide the first in-depth empirical analysis of FLEDGE, and uncover a series of severe design and implementation flaws. We leverage those for conducting 12 novel attacks, including tracking, cross-site leakage, service disruption, and pollution attacks. While FLEDGE aims to enhance user privacy, our research demonstrates that it is currently exposing users to significant risks, and we outline mitigations for addressing the issues that we have uncovered. We have also responsibly disclosed our findings to Google so as to kickstart remediation efforts. We believe that our research highlights the dire need for more in-depth investigations of the entire Privacy Sandbox, due to the massive impact it will have on user privacy.
Stop, Don't Click Here Anymore: Boosting Website Fingerprinting By Considering Sets of Subpages
Asya Mitseva and Andriy Panchenko, Brandenburg University of Technology (BTU Cottbus, Germany)
A type of traffic analysis, website fingerprinting (WFP), aims to reveal the website a user visits over an encrypted and anonymized connection by observing and analyzing data flow patterns. Its efficiency against anonymization networks such as Tor has been widely studied, resulting in methods that have steadily increased in both complexity and power. While modern WFP attacks have proven to be highly accurate in laboratory settings, their real-world feasibility is highly debated. These attacks also exclude valuable information by ignoring typical user browsing behavior: users often visit multiple pages of a single website sequentially, e.g., by following links.
In this paper, we aim to provide a more realistic assessment of the degree to which Tor users are exposed to WFP. We propose both a novel WFP attack and efficient strategies for adapting existing methods to account for sequential visits of pages within a website. While existing WFP attacks fail to detect almost any website in real-world settings, our novel methods achieve F1-scores of 1.0 for more than half of the target websites. Our attacks remain robust against state-of-the-art WFP defenses, achieving 2.5 to 5 times the accuracy of prior work, and in some cases even rendering the defenses useless. Our methods enable to estimate and to communicate to the user the risk of successive page visits within a website (even in the presence of noise pages) to stop before the WFP attack reaches a critical level of confidence.
ML VIII: Backdoors and Federated Learning
Lurking in the shadows: Unveiling Stealthy Backdoor Attacks against Personalized Federated Learning
Xiaoting Lyu, Beijing Jiaotong University; Yufei Han, INRIA; Wei Wang, Jingkai Liu, and Yongsheng Zhu, Beijing Jiaotong University; Guangquan Xu, Tianjin University; Jiqiang Liu, Beijing Jiaotong University; Xiangliang Zhang, University of Notre Dame
Federated Learning (FL) is a collaborative machine learning technique where multiple clients work together with a central server to train a global model without sharing their private data. However, the distribution shift across non-IID datasets of clients poses a challenge to this one-model-fits-all method hindering the ability of the global model to effectively adapt to each client's unique local data. To echo this challenge, personalized FL (PFL) is designed to allow each client to create personalized local models tailored to their private data.
While extensive research has scrutinized backdoor risks in FL, it has remained underexplored in PFL applications. In this study, we delve deep into the vulnerabilities of PFL to backdoor attacks. Our analysis showcases a tale of two cities. On the one hand, the personalization process in PFL can dilute the backdoor poisoning effects injected into the personalized local models. Furthermore, PFL systems can also deploy both server-end and client-end defense mechanisms to strengthen the barrier against backdoor attacks. On the other hand, our study shows that PFL fortified with these defense methods may offer a false sense of security. We propose PFedBA, a stealthy and effective backdoor attack strategy applicable to PFL systems. PFedBA ingeniously aligns the backdoor learning task with the main learning task of PFL by optimizing the trigger generation process. Our comprehensive experiments demonstrate the effectiveness of PFedBA in seamlessly embedding triggers into personalized local models. PFedBA yields outstanding attack performance across 10 state-of-the-art PFL algorithms, defeating the existing 6 defense mechanisms. Our study sheds light on the subtle yet potent backdoor threats to PFL systems, urging the community to bolster defenses against emerging backdoor challenges.
ACE: A Model Poisoning Attack on Contribution Evaluation Methods in Federated Learning
Zhangchen Xu, Fengqing Jiang, and Luyao Niu, University of Washington; Jinyuan Jia, Pennsylvania State University; Bo Li, University of Chicago; Radha Poovendran, University of Washington
In Federated Learning (FL), a set of clients collaboratively train a machine learning model (called global model) without sharing their local training data. The local training data of clients is typically non-i.i.d. and heterogeneous, resulting in varying contributions from individual clients to the final performance of the global model. In response, many contribution evaluation methods were proposed, where the server could evaluate the contribution made by each client and incentivize the high-contributing clients to sustain their long-term participation in FL. Existing studies mainly focus on developing new metrics or algorithms to better measure the contribution of each client. However, the security of contribution evaluation methods of FL operating in adversarial environments is largely unexplored. In this paper, we propose the first model poisoning attack on contribution evaluation methods in FL, termed ACE. Specifically, we show that any malicious client utilizing ACE could manipulate the parameters of its local model such that it is evaluated to have a high contribution by the server, even when its local training data is indeed of low quality. We perform both theoretical analysis and empirical evaluations of ACE. Theoretically, we show our design of ACE can effectively boost the malicious client's perceived contribution when the server employs the widely-used cosine distance metric to measure contribution. Empirically, our results show ACE effectively and efficiently deceive five state-of-the-art contribution evaluation methods. In addition, ACE preserves the accuracy of the final global models on testing inputs. We also explore six countermeasures to defend ACE. Our results show they are inadequate to thwart ACE, highlighting the urgent need for new defenses to safeguard the contribution evaluation methods in FL.
BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning
Songze Li, Southeast University; Yanbo Dai, HKUST(GZ)
In a federated learning (FL) system, decentralized data owners (clients) could upload their locally trained models to a central server, to jointly train a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing misclassification to a target class when encountering attacker-defined triggers. Existing backdoor defenses show inconsistent performance under different system and adversarial settings, especially when the malicious updates are made statistically close to the benign ones. In this paper, we first reveal the fact that planting subsequent backdoors with the same target label could significantly help to maintain the accuracy of previously planted backdoors, and then propose a novel proactive backdoor detection mechanism for FL named BackdoorIndicator, which has the server inject indicator tasks into the global model leveraging out-of-distribution (OOD) data, and then utilizing the fact that any backdoor samples are OOD samples with respect to benign samples, the server, who is completely agnostic of the potential backdoor types and target labels, can accurately detect the presence of backdoors in uploaded models, via evaluating the indicator tasks. We perform systematic and extensive empirical studies to demonstrate the consistently superior performance and practicality of BackdoorIndicator over baseline defenses, across a wide range of system and adversarial settings.
UBA-Inf: Unlearning Activated Backdoor Attack with Influence-Driven Camouflage
Zirui Huang, Yunlong Mao, and Sheng Zhong, Nanjing University
Machine-Learning-as-a-Service (MLaaS) is an emerging product to meet the market demand. However, end users are required to upload data to the remote server when using MLaaS, raising privacy concerns. Since the right to be forgotten came into effect, data unlearning has been widely supported in on-cloud products for removing users' private data from remote datasets and machine learning models. Plenty of machine unlearning methods have been proposed recently to erase the influence of forgotten data. Unfortunately, we find that machine unlearning makes the on-cloud model highly vulnerable to backdoor attacks. In this paper, we report a new threat against models with unlearning enabled and implement an Unlearning Activated Backdoor Attack with Influence-driven camouflage (UBA-Inf). Unlike conventional backdoor attacks, UBA-Inf provides a new backdoor approach for effectiveness and stealthiness by activating the camouflaged backdoor through machine unlearning. The proposed approach can be implemented using off-the-shelf backdoor generating algorithms. Moreover, UBA-Inf is an "on-demand" attack, offering fine-grained control of backdoor activation through unlearning requests, overcoming backdoor vanishing and exposure problems. By extensively evaluating UBA-Inf, we conclude that UBA-Inf is a powerful backdoor approach that improves stealthiness, robustness, and persistence.
Software Security + ML 1
Racing on the Negative Force: Efficient Vulnerability Root-Cause Analysis through Reinforcement Learning on Counterexamples
Dandan Xu, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, and School of Cyber Security, University of Chinese Academy of Sciences, China; Di Tang, Yi Chen, and XiaoFeng Wang, Indiana University Bloomington; Kai Chen, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, and School of Cyber Security, University of Chinese Academy of Sciences, China; Haixu Tang, Indiana University Bloomington; Longxing Li, SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, and School of Cyber Security, University of Chinese Academy of Sciences, China
Root-Cause Analysis (RCA) is crucial for discovering security vulnerabilities from fuzzing outcomes. Automating this process through triaging the crashes observed during the fuzzing process, however, is considered to be challenging. Particularly, today's statistical RCA approaches are known to be exceedingly slow, often taking tens of hours or even a week to analyze a crash. This problem comes from the biased sampling such approaches perform. More specifically, given an input inducing a crash in a program, these approaches sample around the input by mutating it to generate new test cases; these cases are used to fuzz the program, in a hope that a set of program elements (blocks, instructions or predicates) on the execution path of the original input can be adequately sampled so their correlations with the crash can be determined. This process, however, tends to generate the input samples more likely causing the crash, with their execution paths involving a similar set of elements, which become less distinguishable until a large number of samples have been made. We found that this problem can be effectively addressed by sampling around "counterexamples'', the inputs causing a significant change to the current estimates of correlations. These inputs though still involving the elements often do not lead to the crash. They are found to be effective in differentiating program elements, thereby accelerating the RCA process. Based upon the understanding, we designed and implemented a reinforcement learning (RL) technique that rewards the operations involving counterexamples. By balancing random sampling with the exploitation on the counterexamples, our new approach, called RACING, is shown to substantially elevate the scalability and the accuracy of today's statistical RCA, outperforming the state-of-the-art by more than an order of magnitude.
Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection
Niklas Risse and Marcel Böhme, MPI-SP, Germany
Recent results of machine learning for automatic vulnerability detection (ML4VD) have been very promising. Given only the source code of a function f, ML4VD techniques can decide if f contains a security flaw with up to 70% accuracy. However, as evident in our own experiments, the same top-performing models are unable to distinguish between functions that contain a vulnerability and functions where the vulnerability is patched. So, how can we explain this contradiction and how can we improve the way we evaluate ML4VD techniques to get a better picture of their actual capabilities?
In this paper, we identify overfitting to unrelated features and out-of-distribution generalization as two problems, which are not captured by the traditional approach of evaluating ML4VD techniques. As a remedy, we propose a novel benchmarking methodology to help researchers better evaluate the true capabilities and limits of ML4VD techniques. Specifically, we propose (i) to augment the training and validation dataset according to our cross-validation algorithm, where a semantic preserving transformation is applied during the augmentation of either the training set or the testing set, and (ii) to augment the testing set with code snippets where the vulnerabilities are patched.
Using six ML4VD techniques and two datasets, we find (a) that state-of-the-art models severely overfit to unrelated features for predicting the vulnerabilities in the testing data, (b) that the performance gained by data augmentation does not generalize beyond the specific augmentations applied during training, and (c) that state-of-the-art ML4VD techniques are unable to distinguish vulnerable functions from their patches.
Improving ML-based Binary Function Similarity Detection by Assessing and Deprioritizing Control Flow Graph Features
Jialai Wang, Tsinghua University; Chao Zhang, Tsinghua University and Zhongguancun Laboratory; Longfei Chen and Yi Rong, Tsinghua University; Yuxiao Wu, Huazhong University of Science and Technology; Hao Wang, Wende Tan, and Qi Li, Tsinghua University; Zongpeng Li, Tsinghua University and Quancheng Labs
Machine learning-based binary function similarity detection (ML-BFSD) has witnessed significant progress recently. They often choose control flow graph (CFG) as an important feature to learn out of functions, as CFGs characterize the control dependencies between basic code blocks. However, the exact role of CFGs in model decisions is not explored, and the extent to which CFGs might lead to model errors is unknown. This work takes a first step towards assessing the role of CFGs in ML-BFSD solutions both theoretically and practically, and promotes their performance accordingly. First, we adapt existing explanation methods to interpreting ML-BFSD solutions, and theoretically reveal that existing models heavily rely on CFG features. Then, we design a solution deltaCFG to manipulate CFGs and practically demonstrate the lack of robustness of existing models. We have extensively evaluated deltaCFG on 11 state-of-the-art (SOTA) ML-BFSD solutions, and find that the models' results would flip if we manipulate the query functions' CFGs but keep semantics, showing that most models have bias on CFG features. Our theoretic and practical assessment solutions can also serve as a robustness validator for the development of future ML-BFSD solutions. Lastly, we present a solution to utilize deltaCFG to augment training data, which helps deprioritize CFG features and enhance the performance of existing ML-BFSD solutions. Evaluation results show that, MRR, Recall@1, AUC and F1 score of existing models are improved by up to 10.1%, 12.7%, 5.1%, and 27.2% respectively, proving that reducing the models' bias on CFG features could improve their performance.
TYGR: Type Inference on Stripped Binaries using Graph Neural Networks
Chang Zhu, Arizona State University; Ziyang Li and Anton Xue, University of Pennsylvania; Ati Priya Bajaj, Wil Gibbs, and Yibo Liu, Arizona State University; Rajeev Alur, University of Pennsylvania; Tiffany Bao, Arizona State University; Hanjun Dai, Google; Adam Doupé, Arizona State University; Mayur Naik, University of Pennsylvania; Yan Shoshitaishvili and Ruoyu Wang, Arizona State University; Aravind Machiry, Purdue University
Binary type inference is a core research challenge in binary program analysis and reverse engineering. It concerns identifying the data types of registers and memory values in a stripped executable (or object file), whose type information is discarded during compilation. Current methods rely on either manually crafted inference rules, which are brittle and demand significant effort to update, or machine learning-based approaches that suffer from low accuracy.
In this paper we propose TYGR, a graph neural network based solution that encodes data-flow information for inferring both basic and struct variable types in stripped binary programs. To support different architectures and compiler optimizations, TYGR was implemented on top of the ANGR binary analysis platform and uses an architecture-agnostic data-flow analysis to extract a graph-based intra-procedural representation of data-flow information.
We noticed a severe lack of diversity in existing binary executables datasets and created TyDa, a large dataset of diverse binary executables. The sole publicly available dataset, provided by STATEFORMER, contains only 1% of the total number of functions in TyDa. TYGR is trained and evaluated on a subset of TyDa and generalizes to the rest of the dataset. TYGR demonstrates an overall accuracy of 76.6% and struct type accuracy of 45.2% on the x64 dataset across four optimization levels (O0-O3). TYGR outperforms existing works by a minimum of 26.1% in overall accuracy and 10.2% in struct accuracy.
Crypto III: Password and Secret Key
MFKDF: Multiple Factors Knocked Down Flat
Matteo Scarlata and Matilda Backendal, ETH Zurich; Miro Haller, UC San Diego
Nair and Song (USENIX 2023) introduce the concept of a Multi-Factor Key Derivation Function (MFKDF), along with constructions and a security analysis. MFKDF integrates dynamic authentication factors, such as HOTP and hardware tokens, into password-based key derivation. The aim is to improve the security of password-derived keys, which can then be used for encryption or as an alternative to multi-factor authentication. The authors claim an exponential security improvement compared to traditional password-based key derivation functions (PBKDF).
We show that the MFKDF constructions proposed by Nair and Song fall short of the stated security goals. Underspecified cryptographic primitives and the lack of integrity of the MFKDF state lead to several attacks, ranging from full key recovery when an HOTP factor is compromised, to bypassing factors entirely or severely reducing their entropy. We reflect on the different threat models of key-derivation and authentication, and conclude that MFKDF is always weaker than plain PBKDF and multi-factor authentication in each setting.
LaKey: Efficient Lattice-Based Distributed PRFs Enable Scalable Distributed Key Management
Matthias Geihs, Torus Labs; Hart Montgomery, Linux Foundation
Distributed key management (DKM) services are multi-party services that allow their users to outsource the generation, storage, and usage of cryptographic private keys, while guaranteeing that none of the involved service providers learn the private keys in the clear. This is typically achieved through distributed key generation (DKG) protocols, where the service providers generate the keys on behalf of the users in an interactive protocol, and each of the servers stores a share of each key as the result. However, with traditional DKM systems, the key material stored by each server grows linearly with the number of users.
An alternative approach to DKM is via distributed key derivation (DKD) where the user key shares are derived on-demand from a constant-size (in the number of users) secret-shared master key and the corresponding user's identity, which is achieved by employing a suitable distributed pseudorandom function (dPRF). However, existing suitable dPRFs require on the order of 100 interaction rounds between the servers and are therefore insufficient for settings with high network latency and where users demand real-time interaction.
To resolve the situation, we initiate the study of lattice-based distributed PRFs, with a particular focus on their application to DKD. Concretely, we show that the LWE-based PRF presented by Boneh et al. at CRYPTO'13 can be turned into a distributed PRF suitable for DKD that runs in only 8 online rounds, which is an improvement over the start-of-the-art by an order of magnitude. We further present optimizations of this basic construction. We show a new construction with improved communication efficiency proven secure under the same "standard" assumptions. Then, we present even more efficient constructions, running in as low as 5 online rounds, from non-standard, new lattice-based assumptions. We support our findings by implementing and evaluating our protocol using the MP-SPDZ framework (Keller, CCS '20). Finally, we give a formal definition of our DKD in the UC framework and prove a generic construction (for which our construction qualifies) secure in this model.
Exploiting Leakage in Password Managers via Injection Attacks
Andrés Fábrega, Armin Namavari, and Rachit Agarwal, Cornell University; Ben Nassi, Cornell Tech, Technion - Israel Institute of Technology; Thomas Ristenpart, Cornell University, Cornell Tech
This work explores injection attacks against password managers. In this setting, the adversary (only) controls their own application client, which they use to ''inject" chosen payloads to a victim's client via, for example, sharing credentials with them. The injections are interleaved with adversarial observations of some form of protected state (such as encrypted vault exports or the network traffic received by the application servers), from which the adversary backs out confidential information. We uncover a series of general design patterns in popular password managers that lead to vulnerabilities allowing an adversary to efficiently recover passwords, URLs, usernames, and attachments. We develop general attack templates to exploit these design patterns and experimentally showcase their practical efficacy via analysis of ten distinct password manager applications. We disclosed our findings to these vendors, many of which deployed mitigations.
OPTIKS: An Optimized Key Transparency System
Julia Len, Cornell Tech; Melissa Chase, Esha Ghosh, Kim Laine, and Radames Cruz Moreno, Microsoft Research
Key Transparency (KT) refers to a public key distribution system with transparency mechanisms proving its correct operation, i.e., proving that it reports consistent values for each user's public key. While prior work on KT systems have offered new designs to tackle this problem, relatively little attention has been paid on the issue of scalability. Indeed, it is not straightforward to actually build a scalable and practical KT system from existing constructions, which may be too complex, inefficient, or non-resilient against machine failures.
In this paper, we present OPTIKS, a full featured and optimized KT system that focuses on scalability. Our system is simpler and more performant than prior work, supporting smaller storage overhead while still meeting strong notions of security and privacy. Our design also incorporates a crash-tolerant and scalable server architecture, which we demonstrate by presenting extensive benchmarks. Finally, we address several real-world problems in deploying KT systems that have received limited attention in prior work, including account decommissioning and user-to-device mapping.
4:15 pm–4:30 pm
Short Break
Grand Ballroom Foyer
4:30 pm–5:30 pm
Social Issues III: Social Media Platform
Understanding the Security and Privacy Implications of Online Toxic Content on Refugees
Arjun Arunasalam, Purdue University; Habiba Farrukh, University of California, Irvine; Eliz Tekcan and Z. Berkay Celik, Purdue University
Deteriorating conditions in regions facing social and political turmoil have resulted in the displacement of huge populations known as refugees. Technologies such as social media have helped refugees adapt to challenges in their new homes. While prior works have investigated refugees' computer security and privacy (S&P) concerns, refugees' increasing exposure to toxic content and its implications have remained largely unexplored. In this paper, we answer how toxic content can influence refugees' S&P actions, goals, and barriers, and how their experiences shape these factors. Through semi-structured interviews with refugee liaisons (n=12), focus groups (n=9, 27 participants), and an online survey (n=29) with refugees, we discover unique attack contexts (e.g., participants are targeted after responding to posts directed against refugees) and how intersecting identities (e.g., LGBTQ+, women) exacerbate attacks. In response to attacks, refugees take immediate actions (e.g., selective blocking) or long-term behavioral shifts (e.g., ensuring uploaded photos are void of landmarks) These measures minimize vulnerability and discourage attacks, among other goals, while participants acknowledge barriers to measures (e.g., anonymity impedes family reunification). Our findings highlight lessons in better equipping refugees to manage toxic content attacks.
Understanding Help-Seeking and Help-Giving on Social Media for Image-Based Sexual Abuse
Miranda Wei, University of Washington / Google; Sunny Consolvo and Patrick Gage Kelley, Google; Tadayoshi Kohno, University of Washington; Tara Matthews and Sarah Meiklejohn, Google; Franziska Roesner, University of Washington; Renee Shelby, Kurt Thomas, and Rebecca Umbach, Google
Image-based sexual abuse (IBSA), like other forms of technology-facilitated abuse, is a growing threat to people's digital safety. Attacks include unwanted solicitations for sexually explicit images, extorting people under threat of leaking their images, or purposefully leaking images to enact revenge or exert control. In this paper, we explore how people seek and receive help for IBSA on social media. Specifically, we identify over 100,000 Reddit posts that engage relationship and advice communities for help related to IBSA. We draw on a stratified sample of 261 posts to qualitatively examine how various types of IBSA unfold, including the mapping of gender, relationship dynamics, and technology involvement to different types of IBSA. We also explore the support needs of victim-survivors experiencing IBSA and how communities help victim-survivors navigate their abuse through technical, emotional, and relationship advice. Finally, we highlight sociotechnical gaps in connecting victim-survivors with important care, regardless of whom they turn to for help.
Enabling Contextual Soft Moderation on Social Media through Contrastive Textual Deviation
Pujan Paudel, Mohammad Hammas Saeed, Rebecca Auger, Chris Wells, and Gianluca Stringhini, Boston University
Automated soft moderation systems are unable to ascertain if a post supports or refutes a false claim, resulting in a large number of contextual false positives. This limits their effectiveness, for example undermining trust in health experts by adding warnings to their posts or resorting to vague warnings instead of granular fact-checks, which result in desensitizing users. In this paper, we propose to incorporate stance detection into existing automated soft-moderation pipelines, with the goal of ruling out contextual false positives and providing more precise recommendations for social media content that should receive warnings. We develop a textual deviation task called Contrastive Textual Deviation (CTD), and show that it outperforms existing stance detection approaches when applied to soft moderation. We then integrate CTD into the state-of-the-art system for automated soft moderation Lambretta, showing that our approach can reduce contextual false positives from 20% to 2.1%, providing another important building block towards deploying reliable automated soft moderation tools on social media.
The Imitation Game: Exploring Brand Impersonation Attacks on Social Media Platforms
Bhupendra Acharya, CISPA Helmholtz Center for Information Security; Dario Lazzaro, University of Genoa; Efrén López-Morales, Texas A&M University-Corpus Christi; Adam Oest and Muhammad Saad, PayPal Inc.; Antonio Emanuele Cinà, University of Genoa; Lea Schönherr and Thorsten Holz, CISPA Helmholtz Center for Information Security
The rise of social media users has led to an increase in customer support services offered by brands on various platforms. Unfortunately, attackers also use this as an opportunity to trick victims through fake profiles that imitate official brand accounts. In this work, we provide a comprehensive overview of such brand impersonation attacks on social media.
We analyze the fake profile creation and user engagement processes on X, Instagram, Telegram, and YouTube and quantify their impact. Between May and October 2023, we collected 1.3 million user profiles, 33 million posts, and publicly available profile metadata, wherein we found 349,411 squatted accounts targeting 2,625 of 2,847 major international brands. Analyzing profile engagement and user creation techniques, we show that squatting profiles persistently perform various novel attacks in addition to classic abuse such as social engineering, phishing, and copyright infringement. By sharing our findings with the top 100 brands and collaborating with one of them, we further validate the real-world implications of such abuse. Our research highlights a weakness in the ability of social media platforms to protect brands and users from attacks based on username squatting. Alongside strategies such as customer education and clear indicators of trust, our detection model can be used by platforms as a countermeasure to proactively detect abusive accounts.
Wireless Security I: Cellular and Bluetooth
Hermes: Unlocking Security Analysis of Cellular Network Protocols by Synthesizing Finite State Machines from Natural Language Specifications
Abdullah Al Ishtiaq, Sarkar Snigdha Sarathi Das, Syed Md Mukit Rashid, Ali Ranjbar, Kai Tu, Tianwei Wu, Zhezheng Song, Weixuan Wang, Mujtahid Akon, Rui Zhang, and Syed Rafiul Hussain, Pennsylvania State University
In this paper, we present Hermes, an end-to-end framework to automatically generate formal representations from natural language cellular specifications. We first develop a neural constituency parser, NEUTREX, to process transition-relevant texts and extract transition components (i.e., states, conditions, and actions). We also design a domain-specific language to translate these transition components to logical formulas by leveraging dependency parse trees. Finally, we compile these logical formulas to generate transitions and create the formal model as finite state machines. To demonstrate the effectiveness of Hermes, we evaluate it on 4G NAS, 5G NAS, and 5G RRC specifications and obtain an overall accuracy of 81-87%, which is a substantial improvement over the state-of-the-art. Our security analysis of the extracted models uncovers 3 new vulnerabilities and identifies 19 previous attacks in 4G and 5G specifications, and 7 deviations in commercial 4G basebands.
On the Criticality of Integrity Protection in 5G Fronthaul Networks
Jiarong Xing, Rice University; Sophia Yoo, Princeton University; Xenofon Foukas, Microsoft; Daehyeok Kim, The University of Texas at Austin; Michael K. Reiter, Duke University
The modern 5G fronthaul, which connects the base stations to radio units in cellular networks, is designed to deliver microsecond-level performance guarantees using Ethernet-based protocols. Unfortunately, due to potential performance overheads, as well as misconceptions about the low risk and impact of possible attacks, integrity protection is not considered a mandatory feature in the 5G fronthaul standards. In this work, we show how vulnerabilities from the lack of protection can be exploited, making attacks easier and more powerful than ever. We present a novel class of powerful attacks and a set of traditional attacks, which can both be fully launched from software over open packet-based interfaces, to cause performance degradation or denial of service to users over large geographical regions. Our attacks do not require a physical radio presence or signal-based attack mechanisms, do not affect the network's operation (e.g., not crashing the radios), and are highly severe (e.g., impacting multiple cells). We demonstrate the impact of our attacks in an end-to-end manner on a commercial-grade, multi-cell 5G testbed, showing that adversaries can degrade performance of connected users by more than 80%, completely block a selected subset of users from ever attaching to the cell, or even generate signaling storm attacks of more than 2500 signaling messages per minute, with just two compromised cells and four mobile users. We also present an analysis of countermeasures that meet the strict performance requirements of the fronthaul.
SIMurai: Slicing Through the Complexity of SIM Card Security Research
Tomasz Piotr Lisowski, University of Birmingham; Merlin Chlosta, CISPA Helmholtz Center for Information Security; Jinjin Wang and Marius Muench, University of Birmingham
SIM cards are widely regarded as trusted entities within mobile networks. But what if they were not trustworthy? In this paper, we argue that malicious SIM cards are a realistic threat, and demonstrate that they can launch impactful attacks against mobile devices and their basebands.
We design and implement SIMURAI, a software platform for security-focused SIM exploration and experimentation. At its core, SIMURAI features a flexible software implementation of a SIM. In contrast to existing SIM research tooling that typically involves physical SIM cards, SIMURAI adds flexibility by enabling deliberate violation of application-level and transmission-level behavior—a valuable asset for further exploration of SIM features and attack capabilities.
We integrate the platform into common cellular security test beds, demonstrating that smartphones can successfully connect to mobile networks using our software SIM. Additionally, we integrate SIMURAI with emulated baseband firmwares and carry out a fuzzing campaign that leads to the discovery of two high-severity vulnerabilities on recent flagship smartphones. We also demonstrate how rogue carriers and attackers with physical access can trigger these vulnerabilities with ease, emphasizing the need to recognize hostile SIMs in cellular security threat models.
Finding Traceability Attacks in the Bluetooth Low Energy Specification and Its Implementations
Jianliang Wu, Purdue University & Simon Fraser University; Patrick Traynor, University of Florida; Dongyan Xu, Dave (Jing) Tian, and Antonio Bianchi, Purdue University
Bluetooth Low Energy (BLE) provides an efficient and convenient means for connecting a wide range of devices and peripherals. While its designers attempted to make tracking devices difficult through the use of MAC address randomization, a comprehensive analysis of the untraceability for the entire BLE protocol has not previously been conducted. In this paper, we create a formal model for BLE untraceability to reason about additional ways in which the specification allows for user tracking. Our model, implemented using ProVerif, transforms the untraceability problem into a reachability problem, and uncovers four previously unknown issues, namely IRK (Identity Resolving Key) reuse, BD_ADDR (MAC Address of Bluetooth Classic) reuse, CSRK (Connection Signature Resolving Key) reuse, and ID_ADDR (Identity Address) reuse, enabling eight passive or active tracking attacks against BLE. We then build another formal model using Diff-Equivalence (DE) as a comparison to our reachability model. Our evaluation of the two models demonstrates the soundness of our reachability model, whereas the DE model is neither sound nor complete. We further confirm these vulnerabilities in 13 different devices, ranging from embedded systems to laptop computers, with each device having at least 2 of the 4 issues. We finally provide mitigations for both developers and end users. In so doing, we demonstrate that BLE systems remain trackable under several common scenarios.
Mobile Security II
Defects-in-Depth: Analyzing the Integration of Effective Defenses against One-Day Exploits in Android Kernels
Lukas Maar, Graz University of Technology; Florian Draschbacher, Graz University of Technology and A-SIT Austria, Graz; Lukas Lamster and Stefan Mangard, Graz University of Technology
With the mobile phone market exceeding one billion units sold in 2023, ensuring the security of these devices is critical. However, recent research has revealed worrying delays in the deployment of security-critical kernel patches, leaving devices vulnerable to publicly known one-day exploits. While the mainline Android kernel has seen an increase in defense mechanisms, their integration and effectiveness in vendor-supplied kernels are unknown at a large scale.
In this paper, we systematically analyze publicly available one-day exploits targeting the Android kernel over the past three years. We identify multiple exploitation flows representing vulnerability-agnostic strategies to gain high privileges. We then demonstrate that integrating defense-in-depth mechanisms from the mainline Android kernel could mitigate 84.6 % of these exploitation flows. In a subsequent analysis of 994 devices, we reveal a widespread absence of effective defenses across vendors. Depending on the vendor, only 28.8 % to 54.6 % of exploitation flows are mitigated, indicating a 4.62 to 2.951 times worse scenario than the mainline kernel.
Further delving into defense mechanisms, we reveal weaknesses in vendor-specific defenses and advanced exploitation techniques bypassing defense implementations. As these developments pose additional threats, we discuss potential solutions. Lastly, we discuss factors contributing to the absence of effective defenses and offer improvement recommendations. We envision that our findings will guide the inclusion of effective defenses, ultimately enhancing Android security.
Exploring Covert Third-party Identifiers through External Storage in the Android New Era
Zikan Dong, Beijing University of Posts and Telecommunications; Tianming Liu, Monash University/Huazhong University of Science and Technology; Jiapeng Deng and Haoyu Wang, Huazhong University of Science and Technology; Li Li, Beihang University; Minghui Yang and Meng Wang, OPPO; Guosheng Xu, Beijing University of Posts and Telecommunications; Guoai Xu, Harbin Institute of Technology, Shenzhen
Third-party tracking plays a vital role in the mobile app ecosystem, which relies on identifiers to gather user data across multiple apps. In the early days of Android, tracking SDKs could effortlessly access non-resettable hardware identifiers for third-party tracking. However, as privacy concerns mounted, Google has progressively restricted device identifier usage through Android system updates. In the new era, tracking SDKs are only allowed to employ user-resettable identifiers which users can also opt out of, prompting SDKs to seek alternative methods for reliable user identification across apps. In this paper, we systematically explore the practice of third-party tracking SDKs covertly storing their own generated identifiers on external storage, thereby circumventing Android's identifier usage restriction and posing a considerable threat to user privacy. We devise an analysis pipeline for an extensive large-scale investigation of this phenomenon, leveraging kernel-level instrumentation and UI testing techniques to automate the recording of app file operations at runtime. Applying our pipeline to 8,000 Android apps, we identified 17 third-party tracking SDKs that store identifiers on external storage. Our analysis reveals that these SDKs employ a range of storage techniques, including hidden files and attaching to existing media files, to make their identifiers more discreet and persistent. We also found that most SDKs lack adequate security measures, compromising the confidentiality and integrity of identifiers and enabling deliberate attacks. Furthermore, we examined the impact of Scoped Storage - Android's latest defense mechanism for external storage on these covert third-party identifiers, and proposed a viable exploit that breaches such a defense mechanism. Our work underscores the need for greater scrutiny of third-party tracking practices and better solutions to safeguard user privacy in the Android ecosystem.
PURE: Payments with UWB RElay-protection
Daniele Coppola, Giovanni Camurati, Claudio Anliker, Xenia Hofmeier, Patrick Schaller, David Basin, and Srdjan Capkun, ETH Zurich
Distinguished Paper Award Winner
Contactless payments are now widely used and are expected to reach $10 trillion worth of transactions by 2027. Although convenient, contactless payments are vulnerable to relay attacks that enable attackers to execute fraudulent payments. A number of countermeasures have been proposed to address this issue, including Mastercard's relay protection mechanism. These countermeasures, although effective against some Commercial off-the-shelf (COTS) relays, fail to prevent physical-layer relay attacks.
In this work, we leverage the Ultra-Wide Band (UWB) radios incorporated in major smartphones, smartwatches, tags and accessories, and introduce PURE, the first UWB-based relay protection that integrates smoothly into existing contactless payment standards, and prevents even the most sophisticated physical layer attacks. PURE extends EMV payment protocols that are executed between cards and terminals, and does not require any modification to the backend of the issuer, acquirer, or payment network. PURE further tailors UWB ranging to the payment environment (i.e., wireless channels) to achieve both reliability and resistance to all known physical-layer distance reduction attacks against UWB 802.15.4z. We implement PURE within the EMV standard on modern smartphones, and evaluate its performance in a realistic deployment. Our experiments show that PURE provides a sub-meter relay protection with minimal execution overhead (41 ms). We formally verify the security of PURE's integration within Mastercard's EMV protocol using the Tamarin prover.
Do You See How I Pose? Using Poses as an Implicit Authentication Factor for QR Code Payment
Chuxiong Wu and Qiang Zeng, George Mason University
QR code payment has gained enormous popularity in the realm of mobile transactions, but concerns regarding its security keep growing. To bolster the security of QR code payment, we propose pQRAuth, an innovative implicit second-factor authentication approach that exploits smartphone poses. In the proposed approach, when a consumer presents a payment QR code on her smartphone to a merchant's QR code scanner, the scanner's camera captures not only the QR code itself but also the smartphone's poses. By utilizing poses as an additional factor, in conjunction with QR code decoding, the scanner verifies the authenticity of the smartphone presenting the QR code. Our comprehensive evaluation demonstrates the effectiveness of pQRAuth, affirming its security, accuracy and robustness.
Measurement IV: Web
Simulated Stress: A Case Study of the Effects of a Simulated Phishing Campaign on Employees' Perception, Stress and Self-Efficacy
Markus Schöps, Marco Gutfleisch, Eric Wolter, and M. Angela Sasse, Ruhr University Bochum
Many organizations are concerned about being attacked by phishing emails and buy Simulated Phishing Campaigns (SPC) to measure and reduce their employees' susceptibility to these attacks. Whilst some prior studies reported reduced click rates after SPCs, others have raised concerns that it may have undesirable side effects: causing some employees stress, and/or reducing their self-efficacy. This would be counterproductive, since stress and self-efficacy play a key role in learning and behavior change. We report the first study in which stress and self-efficacy were measured with n = 408 employees immediately after they clicked on or reported a simulated phishing email they received as part of an SPC in a large organization. To obtain richer data how employees experienced the SPC, we conducted semi-structured interviews with n = 21 employees. We find that participants who clicked on and reported simulated phishing emails generally perceived SPCs as positive and effective, even though recent research casts doubt on this effectiveness. We further find that participants who clicked on simulated phishing emails had significantly higher stress levels and significantly lower phishing self-efficacy than participants who reported them. We further discuss the impact of our findings and conclude that the effect of SPCs on the perceived stress of employees is an important relationship that needs to be investigated in future studies.
Arcanum: Detecting and Evaluating the Privacy Risks of Browser Extensions on Web Pages and Web Content
Qinge Xie, Manoj Vignesh Kasi Murali, Paul Pearce, and Frank Li, Georgia Institute of Technology
Modern web browsers support rich extension ecosystems that provide users with customized and flexible browsing experiences. Unfortunately, the flexibility of extensions also introduces the potential for abuse, as an extension with sufficient permissions can access and surreptitiously leak sensitive and private browsing data to the extension's authors or third parties. Prior work has explored such extension behavior, but has been limited largely to meta-data about browsing rather than the contents of web pages, and is also based on older versions of browsers, web standards, and APIs, precluding its use for analysis in a modern setting.
In this work, we develop Arcanum, a dynamic taint tracking system for modern Chrome extensions designed to monitor the flow of user content from web pages. Arcanum defines a variety of taint sources and sinks, allowing researchers to taint specific parts of pages at runtime via JavaScript, and works on modern extension APIs, JavaScript APIs, and versions of Chromium. We deploy Arcanum to test all functional extensions currently in the Chrome Web Store for the automated exfiltration of user data across seven sensitive websites: Amazon, Facebook, Gmail, Instagram, LinkedIn, Outlook, and PayPal. We observe significant privacy risks across thousands of extensions, including hundreds of extensions automatically extracting user content from within web pages, impacting millions of users. Our findings demonstrate the importance of user content within web pages, and the need for stricter privacy controls on extensions.
Smudged Fingerprints: Characterizing and Improving the Performance of Web Application Fingerprinting
Brian Kondracki and Nick Nikiforakis, Stony Brook University
Distinguished Paper Award Winner
Open-source web applications have given everyone the ability to deploy complex web applications on their site(s), ranging from blogs and personal clouds, to server administration tools and webmail clients. Given that there exists millions of deployments of this software in the wild, the ability to fingerprint a particular release of a web application residing at a web endpoint is of interest to both attackers and defenders alike.
In this work, we study modern web application fingerprinting techniques and identify their inherent strengths and weaknesses. We design WASABO, a web application testing framework and use it to measure the performance of six web application fingerprinting tools against 1,360 releases of popular web applications. While 94.8% of all web application releases were correctly labeled by at least one fingerprinting tool in ideal conditions, many tools are unable to produce a single version prediction for a particular release. This leads to instances where a release is labeled as multiple disparate versions, resulting in administrator confusion on the security posture of an unknown web application.
We also measure the accuracy of each tool against real-world deployments of the studied web applications, observing up to an 80% drop-off in performance compared to our offline results. To identify causes for this performance degradation, as well as to improve the robustness of these tools in the wild, we design a web-application-agnostic middleware which applies a series of transformations to the traffic of each fingerprinting tool. Overall, we are able to improve the performance of popular web application fingerprinting tools by up to 22.9%, without any modification to the evaluated tools.
Does Online Anonymous Market Vendor Reputation Matter?
Alejandro Cuevas and Nicolas Christin, Carnegie Mellon University
Reputation is crucial for trust in underground markets such as online anonymous marketplaces (OAMs), where there is little recourse against unscrupulous vendors. These markets rely on eBay-like feedback scores and forum reviews as reputation signals to ensure market safety, driving away dishonest vendors and flagging low-quality or dangerous products. Despite their importance, there has been scant work exploring the correlation (or lack thereof) between reputation signals and vendor success. To fill this gap, we study vendor success from two angles: (i) longevity and (ii) future financial success, by studying eight OAMs from 2011 to 2023. We complement market data with social network features extracted from a OAM forum, and by qualitatively coding reputation signals from over 15,000 posts and comments across two subreddits. Using survival analysis techniques and simple Random Forest models, we show that feedback scores (including those imported from other markets) can explain vendors' longevity, but fail to predict vendor disappearance in the short term. Further, feedback scores are not the main predictors of future financial success. Rather, vendors who quickly generate revenue when they start on a market typically end up acquiring the most wealth overall. We show that our models generalize across different markets and time periods spanning over a decade. Our findings provide empirical insights into early identification of potential high-scale vendors, effectiveness of "reputation poisoning" strategies, and how reputation systems could contribute to harm reduction in OAMs. We find in particular that, despite their coarseness, existing reputation signals are useful to identify potentially dishonest sellers, and highlight some possible improvements.
LLM II: Jailbreaking
LLM-Fuzzer: Scaling Assessment of Large Language Model Jailbreaks
Jiahao Yu, Northwestern University; Xingwei Lin, Ant Group; Zheng Yu and Xinyu Xing, Northwestern University
The jailbreak threat poses a significant concern for Large Language Models (LLMs), primarily due to their potential to generate content at scale. If not properly controlled, LLMs can be exploited to produce undesirable outcomes, including the dissemination of misinformation, offensive content, and other forms of harmful or unethical behavior. To tackle this pressing issue, researchers and developers often rely on red-team efforts to manually create adversarial inputs and prompts designed to push LLMs into generating harmful, biased, or inappropriate content. However, this approach encounters serious scalability challenges.
To address these scalability issues, we introduce an automated solution for large-scale LLM jailbreak susceptibility assessment called LLM-Fuzzer. Inspired by fuzz testing, LLM-Fuzzer uses human-crafted jailbreak prompts as starting points. By employing carefully customized seed selection strategies and mutation mechanisms, LLM-Fuzzer generates additional jailbreak prompts tailored to specific LLMs. Our experiments show that LLM-Fuzzer-generated jailbreak prompts demonstrate significantly increased exploitability and transferability. This highlights that many open-source and commercial LLMs suffer from severe jailbreak issues, even after safety fine-tuning.
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu, Washington University in St. Louis; Xiaogeng Liu, University of Wisconsin, Madison; Shunning Liang, Washington University in St. Louis; Zach Cameron, John Burroughs School; Chaowei Xiao, University of Wisconsin, Madison; Ning Zhang, Washington University in St. Louis
Distinguished Paper Award Winner
Recent advancements in generative AI have enabled ubiquitous access to large language models (LLMs). Empowered by their exceptional capabilities to understand and generate human-like text, these models are being increasingly integrated into our society. At the same time, there are also concerns on the potential misuse of this powerful technology, prompting defensive measures from service providers. To overcome such protection, jailbreaking prompts have recently emerged as one of the most effective mechanisms to circumvent security restrictions and elicit harmful content originally designed to be prohibited.
Due to the rapid development of LLMs and their ease of access via natural languages, the frontline of jailbreak prompts is largely seen in online forums and among hobbyists. To gain a better understanding of the threat landscape of semantically meaningful jailbreak prompts, we systemized existing prompts and measured their jailbreak effectiveness empirically. Further, we conducted a user study involving 92 participants with diverse backgrounds to unveil the process of manually creating jailbreak prompts. We observed that users often succeeded in jailbreak prompts generation regardless of their expertise in LLMs. Building on the insights from the user study, we also developed a system using AI as the assistant to automate the process of jailbreak prompt generation.
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services
Zilong Lin, Jian Cui, Xiaojing Liao, and XiaoFeng Wang, Indiana University Bloomington
The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today's public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime.
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Tong Liu and Yingjie Zhang, Institute of Information Engineering, Chinese Academy of Sciences and School of Cyber Security, University of Chinese Academy of Sciences; Zhe Zhao, RealAI; Yinpeng Dong, RealAI and Tsinghua University; Guozhu Meng and Kai Chen, Institute of Information Engineering, Chinese Academy of Sciences and School of Cyber Security, University of Chinese Academy of Sciences
In recent years, large language models (LLMs) have demonstrated notable success across various tasks, but the trustworthiness of LLMs is still an open problem. One specific threat is the potential to generate toxic or harmful responses. Attackers can craft adversarial prompts that induce harmful responses from LLMs. In this work, we pioneer a theoretical foundation in LLMs security by identifying bias vulnerabilities within the safety fine-tuning and design a black-box jailbreak method named DRA (Disguise and Reconstruction Attack), which conceals harmful instructions through disguise and prompts the model to reconstruct the original harmful instruction within its completion. We evaluate DRA across various open-source and closed-source models, showcasing state-of-the-art jailbreak success rates and attack efficiency. Notably, DRA boasts a 91.1% attack success rate on OpenAI GPT-4 chatbot.