Term Extraction For User Profiling: Evaluation By The User
Suzan Verberne, Maya Sappelli, Wessel
Kraaij (UMAP-2013)
There are several methods that social
information systems use to recommend people to their users. Among these methods,
content-based people recommendation is most widely used in many different
systems (e.g., Twitter). The systems collect information from users such as publications, blogs, microblogs, or posts to build their profiles in order to
do recommendation later. How to build effective user profiles from texts is a difficult
task. Most of the studies represent texts as a bag-of-word. Some other try to
extract more meaningful terms or phrases. If a system has a good representation
for user profile, it will be able to find more accurately similar users so that
it can generate good people recommendation to each user. In this paper,
Verberne et al. present a study that compares three popular methods of weighting
terms for user profiling. The results of this study are evaluated by the user in
two different evaluation scenarios: a per-term evaluation, and a holistic (term
cloud) evaluation.
First of all, Verberne et al. briefly discuss
about way of collecting the descriptive terms from a user’s self-authored
document collection. This process of generating candidate terms for user
profiling follows the same steps applied in most of information retrieval
systems. From the user’s document collection, they preprocess the collection by
first converting to plain text and split in sentences, and then extract
candidate terms all occurring n-grams that contain no stopwords and numbers.
The outputs of this process are all candidate terms with their counts.
To find the most descriptive and important
terms in a user’s corpus, they implement three different term scoring methods
which include:
- Parsimonuous language model based (PLM) [1]: term frequency in the personal collection is weighted with the frequency of the term in the background corpus
- Co-occurrence based (CB) [2]: term relevance is determined by the distribution of co-occurrences of the term with frequent terms in the collection
- Kullback-Leibler divergence for informativeness and phraseness (KLIP) [3]: term relevance is based on the expected loss between two language models, measured with point-wise Kullback-Leibler divergence
For the evaluation set-up, they ask five
colleagues to provide a collection of at least 20 documents that are representative
for their work. They collected 22 documents per user mainly about scientific
articles with an average total of around 537.000 words per collections. After
that they use PLM, CB and KLIP to generate three lists, each list includes 100
terms.
In the experience 1 which evaluate
individual terms, Verberne et al. calculate for each term the average of the
three normalized scores, and then order terms by the combined scores and
extract the top-150 terms. They ask the users to indicate which of the terms
are relevant for their work. To measure the performance, they use mean average
precision. Moreover, they also compare the three methods with TF score method
as a baseline. The result shows that all three term extraction methods receive
the better results than the TF score one. Among these three, KLIP performs best
with the average of 43% precision. They claim that more multi-word terms bring
about a better result.
In the experience 2 which evaluate the
terms using term clouds, they generate term clouds using the three term scoring
methods and then ask users to rank the three clouds from best to the worst
representation of their work. The
results also show that KLIP performs best. However, for one user, KLIP generated
the best ranking, but she chose CB cloud as the best visual representation.
Therefore, the authors suggest that the visualization of a term profile can
play a role in how the user perceives the profile.
This paper presents the three different methods to weight terms that help to build better user profiles, improving content-based people recommendation. Through the comparison, the authors claim that users tend to prefer a term scoring method that gives a higher score to multi-word terms than to single-word terms; and they are not always consistent in their judgements of term profiles.
This paper presents the three different methods to weight terms that help to build better user profiles, improving content-based people recommendation. Through the comparison, the authors claim that users tend to prefer a term scoring method that gives a higher score to multi-word terms than to single-word terms; and they are not always consistent in their judgements of term profiles.
[1] Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for in- formation retrieval. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. pp. 178–185. ACM (2004)
[2] Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelli- gence Tools 13(01), 157–169 (2004)
[3] Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment-Volume 18. pp. 33–40. Association for Computational Linguistics (2003)
Comments
Post a Comment