Term Extraction For User Profiling: Evaluation By The User

Suzan Verberne, Maya Sappelli, Wessel Kraaij (UMAP-2013)
There are several methods that social information systems use to recommend people to their users. Among these methods, content-based people recommendation is most widely used in many different systems (e.g., Twitter). The systems collect information from users such as publications, blogs, microblogs, or posts to build their profiles in order to do recommendation later. How to build effective user profiles from texts is a difficult task. Most of the studies represent texts as a bag-of-word. Some other try to extract more meaningful terms or phrases. If a system has a good representation for user profile, it will be able to find more accurately similar users so that it can generate good people recommendation to each user. In this paper, Verberne et al. present a study that compares three popular methods of weighting terms for user profiling. The results of this study are evaluated by the user in two different evaluation scenarios: a per-term evaluation, and a holistic (term cloud) evaluation.

First of all, Verberne et al. briefly discuss about way of collecting the descriptive terms from a user’s self-authored document collection. This process of generating candidate terms for user profiling follows the same steps applied in most of information retrieval systems. From the user’s document collection, they preprocess the collection by first converting to plain text and split in sentences, and then extract candidate terms all occurring n-grams that contain no stopwords and numbers. The outputs of this process are all candidate terms with their counts.
To find the most descriptive and important terms in a user’s corpus, they implement three different term scoring methods which include:

  • Parsimonuous language model based (PLM) [1]: term frequency in the personal collection is weighted with the frequency of the term in the background corpus
  • Co-occurrence based (CB) [2]: term relevance is determined by the distribution of co-occurrences of the term with frequent terms in the collection
  • Kullback-Leibler divergence for informativeness and phraseness (KLIP) [3]: term relevance is based on the expected loss between two language models, measured with point-wise Kullback-Leibler divergence
For the evaluation set-up, they ask five colleagues to provide a collection of at least 20 documents that are representative for their work. They collected 22 documents per user mainly about scientific articles with an average total of around 537.000 words per collections. After that they use PLM, CB and KLIP to generate three lists, each list includes 100 terms.
In the experience 1 which evaluate individual terms, Verberne et al. calculate for each term the average of the three normalized scores, and then order terms by the combined scores and extract the top-150 terms. They ask the users to indicate which of the terms are relevant for their work. To measure the performance, they use mean average precision. Moreover, they also compare the three methods with TF score method as a baseline. The result shows that all three term extraction methods receive the better results than the TF score one. Among these three, KLIP performs best with the average of 43% precision. They claim that more multi-word terms bring about a better result.
In the experience 2 which evaluate the terms using term clouds, they generate term clouds using the three term scoring methods and then ask users to rank the three clouds from best to the worst representation of their work.  The results also show that KLIP performs best. However, for one user, KLIP generated the best ranking, but she chose CB cloud as the best visual representation. Therefore, the authors suggest that the visualization of a term profile can play a role in how the user perceives the profile.

This paper presents the three different methods to weight terms that help to build better user profiles, improving content-based people recommendation. Through the comparison, the authors claim that users tend to prefer a term scoring method that gives a higher score to multi-word terms than to single-word terms; and they are not always consistent in their judgements of term profiles. 

[1] Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for in- formation retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. pp. 178–185. ACM (2004)

[2] Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelli- gence Tools 13(01), 157–169 (2004)

[3] Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment-Volume 18. pp. 33–40. Association for Computational Linguistics (2003)


Popular posts from this blog

[Talk Summary] Machine Learning and Privacy: Friends or Foes?

SoRec: Social Recommendation Using Probabilistic Matrix Factorization

[Talk Summary 3] Personalized Recommendations using Knowledge Graphs