Surfacing Hidden Privacy Risks in Code: Lessons from {LLM} and Retrieval Assisted Detection

Bushra Aloraini; Jimmy Haslam

Tuesday, June 02, 2026 - 2:00 pm–2:20 pm

Bushra Aloraini and Jimmy Haslam, Microsoft

Many privacy failures are caused less by obvious code that handles privacy such as encryption, retention, deletion, and more by everyday design and coding choices that unintentionally expose sensitive data. A common example is Default Grant Access (DGA), whereby access is permitted unless a developer explicitly blocks it. DGA is difficult for traditional static analysis because the risk often emerges from context: defaults, conditional logic, and framework behavior.

We report lessons learned from deploying language model-based detection for DGA in pull requests at scale, and from replacing a static prompt approach with retrieval-augmented generation (RAG) to address high false-positive rates. In an evaluation spanning nine production repositories, four languages, and 183,000+ methods, the RAG system identified more than twice as many confirmed privacy-relevant issues as the standalone LLM approach. However, false-positive rates did not improve significantly, and we observed language-specific noise, especially in TypeScript and C++.

We cover practical engineering insights: Curating high-quality examples from known incidents, managing corpus retrieval trade-offs and integrating detections into pull request review workflows without overwhelming reviewers.

Bushra Aloraini is a Senior Security Researcher at Microsoft working at the intersection of privacy and security. She focuses on large-scale code analysis, developing detection methodologies and partnering with engineering partners to translate anti‑pattern insights into practical tooling. She holds a Ph.D. in Computer Science from the University of Waterloo and is an inventor of privacy-focused analysis techniques.

Connect:

Jimmy Haslam is a Senior Software Engineer at Microsoft with 20+ years of experience. Before Microsoft, he worked in biotech developing analytical methods for cancer detection and treatment. He now applies the same pattern‑recognition and anomaly‑detection toolkit to software systems, uncovering privacy‑ and security‑relevant coding anti‑patterns at scale.

Connect:

BibTeX

@conference {317581,
author = {Bushra Aloraini and Jimmy Haslam},
title = {Surfacing Hidden Privacy Risks in Code: Lessons from {LLM} and Retrieval Assisted Detection},
year = {2026},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun
}

Download

Surfacing Hidden Privacy Risks in Code: Lessons from LLM and Retrieval Assisted Detection

Presentation Video