Evaluating Link-Based Recommendations for Wikipedia

From Wikipedia Quality
Revision as of 23:42, 16 February 2021 by Messalina (talk | contribs) (+ cat.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Evaluating Link-Based Recommendations for Wikipedia
Authors
Malte Schwarzer
Moritz Schubotz
Norman Meuschke
Corinna Breitinger
Volker Markl
Bela Gipp
Publication date
2016
DOI
10.1145/2910896.2910908
Links
Original

Evaluating Link-Based Recommendations for Wikipedia - scientific work related to Wikipedia quality published in 2016, written by Malte Schwarzer, Moritz Schubotz, Norman Meuschke, Corinna Breitinger, Volker Markl and Bela Gipp.

Overview

Literature recommender systems support users in filtering the vast and increasing number of documents in digital libraries and on the Web. For academic literature, research has proven the ability of citation-based document similarity measures, such as Co-Citation (CoCit), or Co-Citation Proximity Analysis (CPA) to improve recommendation quality. In this paper, authors report on the first large-scale investigation of the performance of the CPA approach in generating literature recommendations for Wikipedia, which is fundamentally different from the academic literature domain. Authors analyze links instead of citations to generate article recommendations. Authors evaluate CPA, CoCit, and the Apache Lucene MoreLikeThis (MLT) function, which represents a traditional text-based similarity measure. Authors use two datasets of 779,716 and 2.57 million Wikipedia articles, the Big Data processing framework Apache Flink, and a ten-node computing cluster. To enable large-scale evaluation, authors derive two quasi-gold standards from the links in Wikipedia's "See also" sections and a comprehensive Wikipedia clickstream dataset. Authors results show that the citation-based measures CPA and CoCit have complementary strengths compared to the text-based MLT measure. While MLT performs well in identifying narrowly similar articles that share similar words and structure, the citation- based measures are better able to identify topically related information, such as information on the city of a certain university or other technical universities in the region. The CPA approach, which consistently outperformed CoCit, is better suited for identifying a broader spectrum of related articles, as well as popular articles that typically exhibit a higher quality. Additional benefits of the CPA approach are its lower runtime requirements and its language-independence that allows for a cross-language retrieval of articles. Authors present a manual analysis of exemplary articles to demonstrate and discuss findings. The raw data and source code of study, together with a manual on how to use them, are openly available at: https://github.com/wikimedia/citolytics

Embed

Wikipedia Quality

Schwarzer, Malte; Schubotz, Moritz; Meuschke, Norman; Breitinger, Corinna; Markl, Volker; Gipp, Bela. (2016). "[[Evaluating Link-Based Recommendations for Wikipedia]]".DOI: 10.1145/2910896.2910908.

English Wikipedia

{{cite journal |last1=Schwarzer |first1=Malte |last2=Schubotz |first2=Moritz |last3=Meuschke |first3=Norman |last4=Breitinger |first4=Corinna |last5=Markl |first5=Volker |last6=Gipp |first6=Bela |title=Evaluating Link-Based Recommendations for Wikipedia |date=2016 |doi=10.1145/2910896.2910908 |url=https://wikipediaquality.com/wiki/Evaluating_Link-Based_Recommendations_for_Wikipedia}}

HTML

Schwarzer, Malte; Schubotz, Moritz; Meuschke, Norman; Breitinger, Corinna; Markl, Volker; Gipp, Bela. (2016). &quot;<a href="https://wikipediaquality.com/wiki/Evaluating_Link-Based_Recommendations_for_Wikipedia">Evaluating Link-Based Recommendations for Wikipedia</a>&quot;.DOI: 10.1145/2910896.2910908.