Personalized Access to Scientific Publications: from Recommendation to Explanation
Dario De Nart, Felice Ferrara, and Carlo Tasso (UMAP-2013)
In the paper “Personalized Access to Scientific Publications: from Recommendation to Explanation”, Dario De Nart et al. present a Recommender and Explanation System (RES) that is able to recommend scientific publication and show explanation to users. In order to do recommendation, Dario De Nart et al., firstly, use the The Dikpe Keyphrase extraction algorithm introduced by Pudota (2010) to extract keyphrases from papers. Each keyphrase has a weight called keyphraseness that reveals the several lexical and statistical indicators exploited in the extraction process. Higher is the keyphraseness, more relevant is the KP. For each paper in the collection, they represent it as a conceptual graph. Each node in the graph is a term broken down from a KP. Two nodes are connected when they appear in the same KP. For instance, the KP “document retrieval” is split into two nodes “document” and “retrieval”, and there is a link between them. The weight of a node relies on how many time the term appears in all KPs, and the weight of an edge depends on the keyphraseness score. Similarly, a user profile is also represented as a conceptual graph; it is obtained from all the documents bookmarked by the user.
There are three steps for the recommendation process presented in the paper. Firstly, they compute a recommending score for each document in the collection. The recommending score consists of three sub-scores:
- Coverage: the count of shared nodes between user and document CG, divided by the number of nodes in the document CG.
- Relevance: the average TF-IDF measure of shared terms.
- Similarity: sum of the weights of shared arcs divided by the sum of the weights of all arcs occurring between shared nodes in the user CG.
Secondly, all documents are ranked based on these three score in a 3-dimensional space. Each space ranges from DISCARDED -> FAIR -> GOOD -> EXCELLENT. Finally, the system shows a list of recommended documents to users as a ranked list. Each recommended document consists of two KP list:
- a list of KPs appear in both the user profile and in the document, and
- a list of KPs appear in the document but not in the user profile.
In order evaluate their system, Dario De Nart et al. collect 300 scientific papers categorized into one of 16 predefined topics. Also, they add 200 more documents which are uncategorized to create noise to the collection. After that, they use groups of 2, 4, 6, and 10 seed documents of each topic to generate 250 user profiles. For each profile, they recommend 10 items and compute the accuracy of RES. The item is considered a good recommendation if it belongs to the same topic which is used to generate the user profile. They also compare their system with the baseline TF-IDF. The result shows that RES outperforms the baseline in all cases.Although this work presents a coherent method that can provide an explanation for recommendation, it could be better if the authors visualize keyphrases and their weighting. The visualization could bring about a better summary about the document. Moreover, the work could leverage the tags of bookmarks to improve RES.