7
Conclusion
[9] Executive Order 12958, Classified National Security Informa-
tion. https://www.dss.mil/seclib/eo12958.htm
We have introduced the notion of using the Web to detect
[10] B. Davison, D. Deschenes and D. Lewanda. Finding relevant
undesired inferences. Our proof-of-concept experiments
website queries. Twelfth International World Wide Web Confer-
demonstrate the power of the Web for finding the key-
ence, 2003.
words that are likely to identify a person or topic.
[11] O. de Vel, A. Anderson, M. Corney and G. Mohay. Mining email
As is to be expected with an initial work, there re-
content for author identification forensics. SIGMOD Record,
mains a lot of room for improvement in the algorithms.
Vol. 30, No. 4, December 2001.
In particular, to produce an inference detection tool ca-
[12] Mike Dowman, Valentin Tablan, Hamish Cunningham and
pable of functioning in real-time, as is needed in some
Borislav Popov. Web-Assisted Annotation, Semantic Indexing
applications, improvements already discussed such as
and Search of Television and Radio News. WWW, 2005.
Web caching, additional filtering of results to improve
[13] Factiva Insight:
Reputation Intelligence. https://www.
precision, and deeper hit analysis to improve recall, are
factiva.com
needed. Another avenue for improvement is through
[14] Fetch Technologies. https://www.fetch.com
deeper content analysis (i.e. beyond keyword extrac-
tion). For example, employing a tool capable of deeper
[15] GATE: General Architecture
for
Text
Engineering.
semantic analysis such as [15] may allow for both more
https://gate.ac.uk/projects.html
meaningful extraction of words and phrases for generat-
[16] N. Glance. Community Search Assistant. IUI, 2001.
ing queries, and improved analysis of the returned hits
for more accurate inference detection. In addition, sim-
[17] P. Golle. Revisiting the Uniqueness of Simple Demographics in
the US Population. Workshop on Privacy in the Electronic Soci-
ple improvements to the content analysis such as bet-
ety, 2006.
ter filtering of stop words and html syntax, would create
more useful keyword lists.
[18] Google SOAP search API. https://code.google.com/
apis/soapsearch/
Acknowledgement
[19] J. Hale and S. Shenoi. Catalytic inference analysis: detecting
inference threats due to knowledge discovery. IEEE Symposium
on Security and Privacy, 1997.
The authors are very grateful to Richard Chow and Vern
Paxson for their help in revising earlier versions of this
[20] S. Hill and F. Provost. The myth of the double-blind review? Au-
thor identification using only citations. SIGKDD Explorations,
paper.
2003.
[21] T. Hinke. Database inference engine design approach. Database
References
Security II: Status and Prospects, 1990.
[1] B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, L. Ding, P.
[22] D. Jones. Google's PowerPoint blunder was preventable.
Kolari, A. Sheth, B. Arpinar, A. Joshi and T. Finin. Semantic an-
IR Web Report. https://www.irwebreport.com/
alytics on social networks: experiences in addressing the prob-
perspectives/2006/mar/google blunder.htm
lem of conflict of interest detection. 15th International World
Wide Web Conference, 2006.
[23] E. Kin, Y. Matsuo, M. Ishizuka. Extracting a social network
among entities by web mining. ISWC `06 Workshop on Web
[2] M. Atallah, C. McDonough, S. Nirenburg, and V. Raskin. Nat-
Content Mining with Human Language Technologies, 2006.
ural Language Processing for Information Assurance. Proc. 9th
ACM/SIGSAC New Security Paradigms Workshop (NSPW 00),
[24] M. Koppel and J. Schler. Authorship verification as a one-class
pp.51-65, 2000.
classification problem. Proceedings of the 21st International
Conference on Machine Learning, 2004.
[3] Apache Lucene. https://lucene.apache.org/java/
[25] M. Koppel, J. Schler, S. Argamon and E. Messeri. Authorship
docs/
attribution with thousands of candidate authors. SIGIR `06.
[4] AOL Keyword Searches. https://dontdelete.com/
[26] M. Lapata and F. Keller. The Web as a Baseline: Evaluating the
default.asp
Performance of Unsupervised Web-based Models for a Range of
[5] M. Barbaro and T. Zeller. A face is exposed for AOL searcher
NLP Tasks, HLT-NAACL, 2004.
no. 4417749. The New York Times, August 9, 2006.
[27] G. Leech, P. Rayson and A. Wilson. Word frequencies in writ-
[6] https://www.bongonews.com/layout1.php?
ten and spoken english: based on the British National Corpus.,
event=2315
Longman, London, 2001.
[7] W. Broad. U. S. Web Archive is Said to Reveal a Nuclear Primer.
[28] C. Manning and H. Schutze. Foundations of statistical natural
The New York Times, November 3, 2006.
language processing. MIT Press, 1999.
[29] MedicineNet.com.
[8] https://www.judicialwatch.org/archive/2005/
https://www.medterms.com/
script/main/hp.asp
osama.pdf