Personalized Social Search Based on the User’s Social Network
David Carmel et al. - IBM Research Lab in Haifa, Israel (CIKM-2009)
Personalizing the search process considers the searcher’s personal attributes & preferences so that the system is able to, for example, do personalized query expansion, re-rank the result list or filter presented information. However, few studies consider the preferences of the user’s related people in social network which can be utilized to enrich the user’s preferences. In the paper “Personalized Social Search Based on the User’s Social Network”, Carmel et al. presents a study that leverages both aforementioned information resources to improve social search, which is the search process over “social” data (e.g., documents or people) gathered from Web 2.0 applications. Particularly, they plan to re-rank search results by considering the relationships with other users in the searcher’s social network (SN).
Firstly, Carmel et al. build user profiles from the data collected in the system IBM Lotus Connections, a social software application, by using SaND (i.e., an aggregation tool for information discovery & analysis over the social data). They consider two kinds of relations between two entities, which are direct relations and indirect relations (i.e., Two entities are indirectly related if both are directly related to the same entity). Moreover, in order to see different influences from different SN, they experiment with three types of SNs for personalization. Each network maps a user to a list of related users weighted according to their relationship strength.
- Familiarity SN: direct familiarity relation and indirect familiarity relation
- Similarity SN: similarity between two individuals according to common activity; for example, co-usage of the same tag and co-tagging of the same document
- Overall SN: contains all related people according to the full relationship model
- Topic-based: the user’s top related terms serve as the user’s Topic-based profile
For each user u, the profile includes the ranked list of users related to u, and the ranked list of related terms. The re-ranking mechanism computes three scores to rank entities; the non-personalized SaND score, the score derived from related users, and the score derived from related tags. The impacts of the three scores are controlled by two parameters.
For evaluation, they run an offline study and a user study. The result from the offline study shows that the non-personalized method performs worst; on the other hand, the Similarity-SN significantly outperforms the other two networks. In the user study, they recruit 240 participants with 577 personal queries. The precision of the search results of the personalized search strategies is measured by NDCG@10 and P@10. The result also shows that the non-personalized method receives the worst performance. However, for personalized methods, the Overall-SN which is the worst in the offline study, performs best in this study. The difference between two studies may be because of different kinds of queries or discriminating against authored or commented documents, and biases tagged documents in the off-line approach.