An Efficient Method to Determine which Combination of Keywords Triggered Automatic Filtering of a Message


Ruohan Xiong and Jeffrey Knockel, Citizen Lab, University of Toronto


WeChat, the most popular social media platform in China, has over one billion monthly active users. China-based users of the platform are subject to automatic filtering of chat messages limiting their ability to freely communicate. WeChat is one among many Chinese Internet platforms which automatically filter content using keyword combinations, where if every keyword component belonging to a blacklisted keyword combination appears in a message then it is filtered. Discovering these sensitive combinations has previously been performed by sending messages containing potentially sensitive news articles and, if the article is filtered, attempting to isolate the triggering keyword combination from the article by sending additional messages over the platform. However, due to increasing restrictions on account registration, this testing has become decreasingly economical. In order to improve its economy, we analyzed the algorithm previously used to extract keyword combinations from news articles and found large areas of improvement in addition to subtle flaws. We evaluate multiple approaches borrowing concepts from group testing literature and present an algorithm which eliminates the aforementioned flaws and which requires on average 10.3% as many messages as the one previously used.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

@inproceedings {239070,
author = {Ruohan Xiong and Jeffrey Knockel},
title = {An Efficient Method to Determine which Combination of Keywords Triggered Automatic Filtering of a Message},
booktitle = {9th USENIX Workshop on Free and Open Communications on the Internet (FOCI 19)},
year = {2019},
address = {Santa Clara, CA},
url = {},
publisher = {USENIX Association},
month = aug