Personalized Access to Scientific Publications: from Recommendation to Explanation
Dario De Nart, Felice Ferrara, and Carlo
Tasso (UMAP-2013)
In the paper “Personalized Access to
Scientific Publications: from Recommendation to Explanation”, Dario De Nart et
al. present a Recommender and Explanation System (RES) that is able to
recommend scientific publication and show explanation to users. In order to do recommendation, Dario De
Nart et al., firstly, use the The Dikpe Keyphrase extraction algorithm
introduced by Pudota (2010) to extract keyphrases from papers. Each keyphrase has
a weight called keyphraseness that reveals the several lexical and statistical
indicators exploited in the extraction process. Higher is the keyphraseness,
more relevant is the KP. For each paper in the collection, they represent it as
a conceptual graph. Each node in the graph is a term broken down from a KP. Two
nodes are connected when they appear in the same KP. For instance, the KP “document
retrieval” is split into two nodes “document” and “retrieval”, and there is a
link between them. The weight of a node relies on how many time the term appears
in all KPs, and the weight of an edge depends on the keyphraseness score.
Similarly, a user profile is also represented as a conceptual graph; it is
obtained from all the documents bookmarked by the user.
There are three steps for the
recommendation process presented in the paper. Firstly, they compute a
recommending score for each document in the collection. The recommending score
consists of three sub-scores:
- Coverage: the count of shared nodes between user and document CG, divided by the number of nodes in the document CG.
- Relevance: the average TF-IDF measure of shared terms.
- Similarity: sum of the weights of shared arcs divided by the sum of the weights of all arcs occurring between shared nodes in the user CG.
Secondly, all documents are ranked based
on these three score in a 3-dimensional space. Each space ranges from DISCARDED
-> FAIR -> GOOD -> EXCELLENT. Finally, the system shows a list of
recommended documents to users as a ranked list. Each recommended document
consists of two KP list:
- a list of KPs appear in both the user profile and in the document, and
- a list of KPs appear in the document but not in the user profile.
In order evaluate their system, Dario De
Nart et al. collect 300 scientific papers categorized into one of 16 predefined
topics. Also, they add 200 more documents which are uncategorized to create
noise to the collection. After that, they use groups of 2, 4, 6, and 10 seed
documents of each topic to generate 250 user profiles. For each profile, they
recommend 10 items and compute the accuracy of RES. The item is considered a
good recommendation if it belongs to the same topic which is used to generate
the user profile. They also compare their system with the baseline TF-IDF. The
result shows that RES outperforms the baseline in all cases.
Although
this work presents a coherent method that can provide an explanation for
recommendation, it could be better if the authors visualize keyphrases and
their weighting. The visualization could bring about a better summary about the
document. Moreover, the work could leverage the tags of bookmarks to improve
RES.
Comments
Post a Comment