WOSN '10 Session Abstracts

WORKSHOP PROGRAM ABSTRACTS

Wednesday, June 23, 2010

9:00 a.m.–10:30 a.m.

Ghostbusting Facebook: Detecting and Characterizing Phantom Profiles in Online Social Gaming Applications
Back to Program
A fundamental question when studying underlying friendship and interaction graphs of Online Social Networks (OSNs) is how closely these graphs mirror real-world associations. The presence of phantom or fake profiles dilutes the integrity of this correspondence. This paper looks at the presence of phantom profiles in the context of social gaming, i.e., profiles created with the purpose of gaining a strategic advantage in social games. Through a measurement-based study of a Facebook gaming application that is known to attract phantom profiles, we show statistical differences among a subset of features associated with genuine and phantom profiles. We then use supervised learning to classify phantom profiles. Our work represents a first-step towards solving the more general problem of detecting fake/phantom identities in OSNs.

Diffusion Dynamics of Games on Online Social Networks
Back to Program
Social games, being embedded in and drawing upon existing social networks, have the potential to spread virally. In this paper, we examine two popular social games, each having millions of Facebook users, to understand the role that users play individually and collectively in propagating social applications. At the individual level, the users' invitation behavior significantly outweighs their demographic and social network properties in predicting invitation success rate. At the collective level, we demonstrate that social games that encourage group formation tend to rapidly assimilate dense network cliques. Finally, engagement in a social game is closely tied with the ability to recruit friends.

Outtweeting the Twitterers—Predicting Information Cascades in Microblogs
Back to Program
Microblogging sites are a unique and dynamic Web 2.0 communication medium. Understanding the information flow in these systems can not only provide better insights into the underlying sociology, but is also crucial for applications such as content ranking, recommendation and filtering, spam detection and viral marketing. In this paper, we characterize the propagation of URLs in the social network of Twitter, a popular microblogging site. We track 15 million URLs exchanged among 2.7 million users over a 300 hour period. Data analysis uncovers several statistical regularities in the user activity, the social graph, the structure of the URL cascades and the communication dynamics. Based on these results we propose a propagation model that predicts which users are likely to mention which URLs. The model correctly accounts for more than half of the URL mentions in our data set, while maintaining a false positive rate lower than 15%.

10:50 a.m.–12:20 p.m.

Privacy Leakage in Mobile Online Social Networks
Back to Program
Mobile Online Social Networks (mOSNs) have recently grown in popularity. With the ubiquitous use of mobile devices and a rapid shift of technology and access to OSNs, it is important to examine the impact of mobile OSNs from a privacy standpoint. We present a taxonomy of ways to study privacy leakage and report on the current status of known leakages. We find that all mOSNs in our study exhibit some leakage of private information to third parties. Novel concerns include combination of new features unique to mobile access with the leakage in OSNs that we had examined earlier.

Don't Tread on Me: Moderating Access to OSN Data with SpikeStrip
Back to Program
Online social networks rely on their valuable data stores to attract users and produce income. Their survival depends on the ability to protect users' profiles and disseminate it to other users through controlled channels. Given the sparse user adoption of privacy policies, however, there is increasing incentive and opportunity for malicious parties to extract these datasets for profit using automated "crawlers" and "screen-scrapers." With the arrival of distributed botnets and low-cost hosted VMs, attackers can perform fast, distributed crawls that evade traditional detectors and rate limiters. We propose SpikeStrip, a server add-on that uses light-weight link encryption to isolate and rate limit crawlers. We experiment with real OSN data, and show that SpikeStrip successfully curtails sophisticated, distributed crawlers while imposing minimal server throughput overhead and inconvenience to end-users.

Prediction Promotes Privacy in Dynamic Social Networks
Back to Program
Recent work on anonymizing online social networks (OSNs) has looked at privacy preserving techniques for publishing a single instance of the network. However, OSNs evolve and a single instance is inadequate for analyzing their evolution or performing longitudinal data analysis. We study the problem of repeatedly publishing OSN data as the network evolves while preserving privacy of users. Publishing multiple instances independently has privacy risks, since stitching the information together may allow an adversary to identify users. We provide methods to anonymize a dynamic network when new nodes and edges are added to the published network. These methods use link prediction algorithms to model the evolution. Using this predicted graph to perform group-based anonymization, the loss in privacy caused by new edges can be eliminated almost entirely. We propose metrics for privacy loss, and evaluate them for publishing multiple OSN instances.

1:30 p.m.–3:30 p.m.

A Geometric Model for On-line Social Networks
Back to Program
We study the link structure of on-line social networks (OSNs), and introduce a new model for such networks which may help infer their hidden underlying reality. In the geo-protean (GEO-P) model for OSNs nodes are identified with points in Euclidean space, and edges are stochastically generated by a mixture of the relative distance of nodes and a ranking function. With high probability, the GEO-P model generates graphs satisfying many observed properties of OSNs, such as power law degree distributions, the small world property, densification power law, and bad spectral expansion. We introduce the dimension of an OSN based on our model, and examine this new parameter using actual OSN data. We discuss how the dimension parameter of an OSN may eventually be used as a tool to group users with similar attributes using only the link structure of the network.

Distance Matters: Geo-social Metrics for Online Social Networks
Back to Program
Online Social Networks (OSNs) are increasingly becoming one of the key media of communication over the Internet. The potential of these services as the basis to gather statistics and exploit information about user behavior is appealing and, as a consequence, the number of applications developed for these purposes has been soaring. At the same time, users are now willing to share information about their location, allowing for the study of the role of geographic distance in social ties. In this paper we present a graph analysis based approach to study social networks with geographic information and new metrics to characterize how geographic distance affects social structure. We apply our analysis to four large-scale OSN datasets: our results show that there is a vast portion of users with short-distance links and that clusters of friends are often geographically close. In addition, we demonstrate that different social networking services exhibit different geo-social properties: OSNs based mainly on location-advertising largely foster local ties and clusters, while services used mainly for news and content sharing present more connections and clusters on longer distances. The results of this work can be exploited to improve many classes of systems and a potential vast number of applications, as we illustrate by means of some practical examples.

Orion: Shortest Path Estimation for Large Social Graphs
Back to Program
Through measurements, researchers continue to produce large social graphs that capture relationships, transactions, and social interactions between users. Efficient analysis of these graphs requires algorithms that scale well with graph size. We examine node distance computation, a critical primitive in graph problems such as computing node separation, centrality computation, mutual friend detection, and community detection. For large million-node social graphs, computing even a single shortest path using traditional breadth-first-search can take several seconds. In this paper, we propose a novel node distance estimation mechanism that effectively maps nodes in high dimensional graphs to positions in low-dimension Euclidean coordinate spaces, thus allowing constant time node distance computation. We describe Orion, a prototype graph coordinate system, and explore critical decisions in its design. Finally, we evaluate the accuracy of Orion's node distance estimates, and show that it can produce accurate results in applications such as node separation, node centrality, and ranked social search.

The Effects of Restrictions on Number of Connections in OSNs: A Case-Study on Twitter
Back to Program
Most popular Online Social Networks (OSNs) in today's world, such as Facebook, Orkut and Twitter impose restrictions on the number of friends / connections that a member can have in the network. This is primarily due to two reasons—to limit spam and to reduce the strain on the system due to member-to-all-friends communication. We study the effects of such restrictions on node-degree, on the topological properties of the OSN networks, taking the restriction imposed by Twitter as a case-study. To the best of our knowledge, this is the first study of its nature, on any OSN. Towards this end, we use a network growth model based on preferential attachment to develop an analytical framework that can be used to assess the effects of various forms of restrictions on OSNs, as well as to design new restrictions of varying rigidity.

3:50 p.m.–5:20 p.m.

Voice of the Customers: Mining Online Customer Reviews for Product Feature-based Ranking
Back to Program
Increasingly large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. As the number of products being sold online increases, it is becoming increasingly difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, customer reviews, particularly the text describing the features, comparisons and experiences of using a particular product provide a rich source of information to compare products and make purchasing decisions. Online retailers like Amazon.com1 allow customers to add reviews of products they have purchased. These reviews have become a diverse and reliable source to aid other customers. Traditionally, many customers have used expert rankings which rate limited a number of products. Existing automated ranking mechanisms typically rank products based on their overall quality. However, a product usually has multiple product features, each of which plays a different role. Different customers may be interested in different features of a product, and their preferences may vary accordingly. In this paper, we present a feature-based product ranking technique that mines thousands of customer reviews. We first identify product features within a product category and analyze their frequencies and relative usage. For each feature, we identify subjective and comparative sentences in reviews. We then assign sentiment orientations to these sentences. By using the information obtained from customer reviews, we model the relationships among products by constructing a weighted and directed graph. We then mine this graph to determine the relative quality of products. Experiments on Digital Camera and Television reviews from real-world data on Amazon.com are presented to demonstrate the results of the proposed techniques.

I rate you. You rate me. Should we do so publicly?
Back to Program
We find that ratings are not absolute, but rather depend on whether they are given anonymously or under one's own name and whether they are displayed publicly or held confidentially. The potential to reciprocate produces higher and more correlated ratings than when individuals are unable to see how others rated them. Ratings further depend on the gender and nationalities of the raters and ratees. All of these findings indicate that ratings should not be taken at face value without considering social nuances.

Measuring Online Service Availability Using Twitter
Back to Program
Real-time micro-blogging services such as Twitter are widely recognized for their social dynamics—how they both encapsulate a social graph and propagate information across it. However, the content of this information is equally interesting since it frequently reflects individual experiences with a broad variety of real-time events. Indeed, events of broad interest are commonly revealed in correlated spikes of semantically-related posting activity. In this paper, we explore one such application this of phenomenon: using Twitter data to infer on-line Internet service availability. We show that simple techniques are sufficient to extract key semantic content from "tweets" (i.e., service X is down) and also filter out extraneous noise. We demonstrate the efficacy of this approach at identifying a range of large-scale service outages in 2009 for popular services such as Gmail, Bing and PayPal.

Need help? Use our Contacts page.

Last changed: 25 June 2010 jp