Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia

From Wikipedia Quality
Jump to: navigation, search
Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia
Authors
Maik Anderka
Benno Maria Stein
Nedim Lipka
Publication date
2012
ISBN
978-145031658-3
DOI
10.1145/2348283.2348413
Links

Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia - scientific work about Wikipedia quality published in 2012, written by Maik Anderka, Benno Maria Stein and Nedim Lipka.

Overview

The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. Authors apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. Authors present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. Authors argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. Authors develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, authors analyze the effects of a biased sample selection. In this regard authors illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1.

Embed

Wikipedia Quality

Anderka, Maik; Stein, Benno Maria; Lipka, Nedim. (2012). "[[Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia]]". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7448 LNCS, 2012, pp. 378-389. ISBN: 978-145031658-3. DOI: 10.1145/2348283.2348413.

English Wikipedia

{{cite journal |last1=Anderka |first1=Maik |last2=Stein |first2=Benno Maria |last3=Lipka |first3=Nedim |title=Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia |date=2012 |isbn=978-145031658-3 |doi=10.1145/2348283.2348413 |url=https://wikipediaquality.com/wiki/Predicting_Quality_Flaws_in_User-Generated_Content:_The_Case_of_Wikipedia |journal=Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7448 LNCS, 2012, pp. 378-389}}

HTML

Anderka, Maik; Stein, Benno Maria; Lipka, Nedim. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Predicting_Quality_Flaws_in_User-Generated_Content:_The_Case_of_Wikipedia">Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia</a>&quot;. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7448 LNCS, 2012, pp. 378-389. ISBN: 978-145031658-3. DOI: 10.1145/2348283.2348413.