Predicting Privacy Behavior on Online Social Networks
Cailing Dong, Hongxia Jin , and Bart P. Knijnenburg (AAAI-15)
People often share personal details about themselves and even share their current activity and/or real-time location. All this public sharing of personal and sometimes private information may increase security risks, or lead to threats to one’s personal reputation. Although, users are aware of those issues, a trade-off between the benefit of social interaction and potential risk of information sharing decisions is difficult for users to make. In this paper, Dong et al. present an investigation about the psychological and contextual factors that affect users’ privacy decision-making practices, and then use the most important features w.r.t these factors to build a model that is able to predict users’ disclosure behavior on Online Social Networks (OSN). The prediction result can be used to give users personalized advice on making privacy decision.
Dong et al. present their method in three stops. Firstly, the collect data and investigate the psychological and contextual factors in light of two common OSN scenarios: Information Requests and Information Sharing. These factors include the sharing tendency of the user, the trustworthiness of the requester, the sensitivity of the information, the appropriateness of the request, and traditional contextual factors. After that, they analyze the impact of the factors in making privacy decision of a user. Finally, they use these factors to build a privacy decision-making prediction model to predict users’ sharing behavior.
In order to analyze the main psychological and contextual factors that affect OSN users’ privacy decision making, they find behavioral analogs from users’ information w.r.t these factors. They collect data on Google+ to investigate the factors related to “information requests” (i.e., a user sends a friend request to another), and data from a location sharing preference study to investigate the contextual factors related to “information sharing”.
- The Google+ dataset: consists of a set of tuples which represents user features (including profile features and activities features), relationship features between a requester and a receiver, and the decision label indicating whether the receiver accepts the request. These features correspond to the five factors aforementioned; for example, the followtendency and conservative features (in user features) correspond to the sharing tendency factor.
- Location sharing preference survey: Participants were randomly assigned ten scenarios, and asked to choose whether they would share their location with three different groups of audiences: Family, Friend and Colleague. For each targeted group of audience, they collect a set of sharing records which represent user features, relationship features describing the relationship between user, audience and location, location features, and audience features. Each feature also corresponds to the main factors; for instance, location features which describe the sensitivity of a location correspond to the sensitivity factor.
The authors obtain the impact of the factors by ranking the features based on their chi-squared statistic and information gain w.r.t the decision outcomes using 10 cross-validation on Weka. After observing the results and choose the most important features, they use these features to build a decision-making model for our Google+ dataset and three other models for the three location sharing datasets (sharing with family, friend and colleague) separately.
To evaluate the predicting models, the authors apply 10 fold cross validation to split the training and testing datasets. They use the F1 and AUC as evaluation metrics and apply several classification algorithms to build the models, including Naıve Bayes, J48, Random Tree, etc., and then choose the best one (J48) based on their performance. Moreover, they verify the effectiveness of each factor by testing the models without the corresponding factor, removing all the features belonging to each of the factors. The results show that:
- By using all the factors, F1 and AUC performances are very good (nearly 0.9 for Google+ dataset).
- Removing some factors may reduce the prediction performance.
With an increase of social information access, the privacy of OSN users should be taken into consideration to protect their benefits. Although, some efforts have been made to provide the users with ways to control their privacy, it is still difficult for them to balance a tradeoff between social benefits and potential risk of sharing personal information. Actually, some studies show that people shared more than they expected. Dong et al. show the importance of some psychological and contextual factors affecting users’ making decision and propose a privacy decision-making prediction model that can be used in a privacy adaptation procedure to assist users to protect their privacy in online social networks OSNs. As stated in the paper, the model is a binary classifier; for privacy recommenders, it could be better to calculate a “privacy risk” score based on the identified factors and let user make the final decision.