'Unknown' by Unknown - Page 15 of 16

Conclusion

[9] Executive Order 12958, Classified National Security Informa-

tion. https://www.dss.mil/seclib/eo12958.htm

We have introduced the notion of using the Web to detect

[10] B. Davison, D. Deschenes and D. Lewanda. Finding relevant

undesired inferences. Our proof-of-concept experiments

website queries. Twelfth International World Wide Web Confer-

demonstrate the power of the Web for finding the key-

ence, 2003.

words that are likely to identify a person or topic.

[11] O. de Vel, A. Anderson, M. Corney and G. Mohay. Mining email

As is to be expected with an initial work, there re-

content for author identification forensics. SIGMOD Record,

mains a lot of room for improvement in the algorithms.

Vol. 30, No. 4, December 2001.

In particular, to produce an inference detection tool ca-

[12] Mike Dowman, Valentin Tablan, Hamish Cunningham and

pable of functioning in real-time, as is needed in some

Borislav Popov. Web-Assisted Annotation, Semantic Indexing

applications, improvements already discussed such as

and Search of Television and Radio News. WWW, 2005.

Web caching, additional filtering of results to improve

[13] Factiva Insight:

Reputation Intelligence. https://www.

precision, and deeper hit analysis to improve recall, are

factiva.com

needed. Another avenue for improvement is through

[14] Fetch Technologies. https://www.fetch.com

deeper content analysis (i.e. beyond keyword extrac-

tion). For example, employing a tool capable of deeper

[15] GATE: General Architecture

for

Text

Engineering.

semantic analysis such as [15] may allow for both more

https://gate.ac.uk/projects.html

meaningful extraction of words and phrases for generat-

[16] N. Glance. Community Search Assistant. IUI, 2001.

ing queries, and improved analysis of the returned hits

for more accurate inference detection. In addition, sim-

[17] P. Golle. Revisiting the Uniqueness of Simple Demographics in

the US Population. Workshop on Privacy in the Electronic Soci-

ple improvements to the content analysis such as bet-

ety, 2006.

ter filtering of stop words and html syntax, would create

more useful keyword lists.

[18] Google SOAP search API. https://code.google.com/

apis/soapsearch/

Acknowledgement

[19] J. Hale and S. Shenoi. Catalytic inference analysis: detecting

inference threats due to knowledge discovery. IEEE Symposium

on Security and Privacy, 1997.

The authors are very grateful to Richard Chow and Vern

Paxson for their help in revising earlier versions of this

[20] S. Hill and F. Provost. The myth of the double-blind review? Au-

thor identification using only citations. SIGKDD Explorations,

paper.

2003.

[21] T. Hinke. Database inference engine design approach. Database

References

Security II: Status and Prospects, 1990.

[1] B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, L. Ding, P.

[22] D. Jones. Google's PowerPoint blunder was preventable.

Kolari, A. Sheth, B. Arpinar, A. Joshi and T. Finin. Semantic an-

IR Web Report. https://www.irwebreport.com/

alytics on social networks: experiences in addressing the prob-

perspectives/2006/mar/google blunder.htm

lem of conflict of interest detection. 15th International World

Wide Web Conference, 2006.

[23] E. Kin, Y. Matsuo, M. Ishizuka. Extracting a social network

among entities by web mining. ISWC `06 Workshop on Web

[2] M. Atallah, C. McDonough, S. Nirenburg, and V. Raskin. Nat-

Content Mining with Human Language Technologies, 2006.

ural Language Processing for Information Assurance. Proc. 9th

ACM/SIGSAC New Security Paradigms Workshop (NSPW 00),

[24] M. Koppel and J. Schler. Authorship verification as a one-class

pp.51-65, 2000.

classification problem. Proceedings of the 21st International

Conference on Machine Learning, 2004.

[3] Apache Lucene. https://lucene.apache.org/java/

[25] M. Koppel, J. Schler, S. Argamon and E. Messeri. Authorship

docs/

attribution with thousands of candidate authors. SIGIR `06.

[4] AOL Keyword Searches. https://dontdelete.com/

[26] M. Lapata and F. Keller. The Web as a Baseline: Evaluating the

default.asp

Performance of Unsupervised Web-based Models for a Range of

[5] M. Barbaro and T. Zeller. A face is exposed for AOL searcher

NLP Tasks, HLT-NAACL, 2004.

no. 4417749. The New York Times, August 9, 2006.

[27] G. Leech, P. Rayson and A. Wilson. Word frequencies in writ-

[6] https://www.bongonews.com/layout1.php?

ten and spoken english: based on the British National Corpus.,

event=2315

Longman, London, 2001.

[7] W. Broad. U. S. Web Archive is Said to Reveal a Nuclear Primer.

[28] C. Manning and H. Schutze. Foundations of statistical natural

The New York Times, November 3, 2006.

language processing. MIT Press, 1999.

[29] MedicineNet.com.

[8] https://www.judicialwatch.org/archive/2005/

https://www.medterms.com/

script/main/hp.asp

osama.pdf