Predicting Privacy Behavior on Online Social Networks
Cailing Dong, Hongxia Jin , and Bart P.
Knijnenburg (AAAI-15)
People often share personal details about
themselves and even share their current activity and/or real-time location. All
this public sharing of personal and sometimes private information may increase
security risks, or lead to threats to one’s personal reputation. Although,
users are aware of those issues, a trade-off between the benefit of social
interaction and potential risk of information sharing decisions is difficult
for users to make. In this paper, Dong et al. present an investigation about the
psychological and contextual factors that affect users’ privacy decision-making
practices, and then use the most important features w.r.t these factors to
build a model that is able to predict users’ disclosure behavior on Online
Social Networks (OSN). The prediction result can be used to give users
personalized advice on making privacy decision.
Dong et al. present their method in three
stops. Firstly, the collect data and investigate the psychological and
contextual factors in light of two common OSN scenarios: Information Requests
and Information Sharing. These factors include the sharing tendency of the user, the trustworthiness of the requester, the sensitivity of the information, the appropriateness of the request, and traditional contextual factors. After that, they
analyze the impact of the factors in making privacy decision of a user.
Finally, they use these factors to build a privacy decision-making prediction
model to predict users’ sharing behavior.
In order to analyze the main psychological
and contextual factors that affect OSN users’ privacy decision making, they
find behavioral analogs from users’ information w.r.t these factors. They collect
data on Google+ to investigate the factors related to “information requests”
(i.e., a user sends a friend request to another), and data from a location
sharing preference study to investigate the contextual factors related to “information
sharing”.
- The Google+ dataset: consists of a set of tuples which represents user features (including profile features and activities features), relationship features between a requester and a receiver, and the decision label indicating whether the receiver accepts the request. These features correspond to the five factors aforementioned; for example, the followtendency and conservative features (in user features) correspond to the sharing tendency factor.
- Location sharing preference survey: Participants were randomly assigned ten scenarios, and asked to choose whether they would share their location with three different groups of audiences: Family, Friend and Colleague. For each targeted group of audience, they collect a set of sharing records which represent user features, relationship features describing the relationship between user, audience and location, location features, and audience features. Each feature also corresponds to the main factors; for instance, location features which describe the sensitivity of a location correspond to the sensitivity factor.
The authors obtain the impact of the
factors by ranking the features based on their chi-squared statistic and
information gain w.r.t the decision outcomes using 10 cross-validation on Weka.
After observing the results and choose the most important features, they use
these features to build a decision-making model for our Google+ dataset and three
other models for the three location sharing datasets (sharing with family,
friend and colleague) separately.
To evaluate the predicting models, the
authors apply 10 fold cross validation to split the training and testing
datasets. They use the F1 and AUC as evaluation metrics and apply several
classification algorithms to build the models, including Naıve Bayes, J48,
Random Tree, etc., and then choose the best one (J48) based on their
performance. Moreover, they verify the effectiveness of each factor by testing
the models without the corresponding factor, removing all the features belonging
to each of the factors. The results show that:
- By using all the factors, F1 and AUC performances are very good (nearly 0.9 for Google+ dataset).
- Removing some factors may reduce the prediction performance.
With an increase of social information access, the privacy of OSN users
should be taken into consideration to protect their benefits. Although, some
efforts have been made to provide the users with ways to control their privacy,
it is still difficult for them to balance a tradeoff between social benefits
and potential risk of sharing personal information. Actually, some studies show
that people shared more than they expected. Dong et al. show the importance of
some psychological and contextual factors affecting users’ making decision and
propose a privacy decision-making prediction model that can be used in a
privacy adaptation procedure to assist users to protect their privacy in online
social networks OSNs. As stated in the paper, the model is a binary classifier;
for privacy recommenders, it could be better to calculate a “privacy risk”
score based on the identified factors and let user make the final decision.
Comments
Post a Comment