Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach

Authors: 

Ruofan Liu, Yun Lin, Xianglin Yang, and Siang Hwee Ng, National University of Singapore; Dinil Mon Divakaran, Trustwave; Jin Song Dong, National University of Singapore

Abstract: 

Explainable phishing detection approaches are usually based on references, i.e., they compare a suspicious webpage against a reference list of commonly targeted legitimate brands' webpages. If a webpage is detected as similar to any referenced website but their domains are not aligned, a phishing alert is raised with an explanation comprising its targeted brand. In comparison to other techniques, such explainable reference-based solutions are more robust to ever-changing phishing webpages. However, the webpage similarity is still measured by representations conveying only partial intentions (e.g., screenshot and logo), which (i) incurs considerable false positives and (ii) gives an adversary opportunities to compromise user confidence in the approaches.

In this work, we propose, PhishIntention, to extract precise phishing intention of a webpage by visually (i) extracting its brand intention and credential-taking intention, and (ii) interacting with the webpage to confirm the credential-taking intention. We design PhishIntention as a heterogeneous system of deep learning vision models, overcoming various technical challenges. The models "look at" and "interact with" the webpage for its intention, which are robust to potential HTML obfuscation. We compare PhishIntention with four state-of-the-art reference-based approaches on the largest phishing identification dataset consisting of 50K phishing and benign webpages. For similar level of recall, PhishIntention achieves significantly higher precision than the baselines. Moreover, we conduct a continuous field study on the Internet for two months to discover emerging phishing webpages. PhishIntention detects 1,942 new phishing webpages (1,368 not reported by VirusTotal). Comparing to the best baseline, PhishIntention generates 86.5% less false alerts (139 vs. 1,033 false positives) while detecting similar number of real phishing webpages.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {279900,
author = {Ruofan Liu and Yun Lin and Xianglin Yang and Siang Hwee Ng and Dinil Mon Divakaran and Jin Song Dong},
title = {Inferring Phishing Intention via Webpage Appearance and Dynamics: A Deep Vision Based Approach},
booktitle = {31st USENIX Security Symposium (USENIX Security 22)},
year = {2022},
isbn = {978-1-939133-31-1},
address = {Boston, MA},
pages = {1633--1650},
url = {https://www.usenix.org/conference/usenixsecurity22/presentation/liu-ruofan},
publisher = {USENIX Association},
month = aug
}

Presentation Video