Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia

Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia
Authors	Maik Anderka Benno Maria Stein Nedim Lipka
Publication date	2012
ISBN	978-145031658-3
DOI	10.1145/2348283.2348413
Links

Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia - scientific work about Wikipedia quality published in 2012, written by Maik Anderka, Benno Maria Stein and Nedim Lipka.

Overview

The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. Authors apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. Authors present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. Authors argue that common binary or multiclass classification approaches are ineffective for the prediction of quality flaws and hence cast quality flaw prediction as a one-class classification problem. Authors develop a quality flaw model and employ a dedicated machine learning approach to predict Wikipedia's most important quality flaws. Since in the Wikipedia setting the acquisition of significant test data is intricate, authors analyze the effects of a biased sample selection. In this regard authors illustrate the classifier effectiveness as a function of the flaw distribution in order to cope with the unknown (real-world) flaw-specific class imbalances. The flaw prediction performance is evaluated with 10,000 Wikipedia articles that have been tagged with the ten most frequent quality flaws: provided test data with little noise, four flaws can be detected with a precision close to 1.

Embed

Wikipedia Quality

Anderka, Maik; Stein, Benno Maria; Lipka, Nedim. (2012). "[[Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia]]". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7448 LNCS, 2012, pp. 378-389. ISBN: 978-145031658-3. DOI: 10.1145/2348283.2348413.

English Wikipedia

{{cite journal |last1=Anderka |first1=Maik |last2=Stein |first2=Benno Maria |last3=Lipka |first3=Nedim |title=Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia |date=2012 |isbn=978-145031658-3 |doi=10.1145/2348283.2348413 |url=https://wikipediaquality.com/wiki/Predicting_Quality_Flaws_in_User-Generated_Content:_The_Case_of_Wikipedia |journal=Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7448 LNCS, 2012, pp. 378-389}}

HTML

Anderka, Maik; Stein, Benno Maria; Lipka, Nedim. (2012). "<a href="https://wikipediaquality.com/wiki/Predicting_Quality_Flaws_in_User-Generated_Content:_The_Case_of_Wikipedia">Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia</a>". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 7448 LNCS, 2012, pp. 378-389. ISBN: 978-145031658-3. DOI: 10.1145/2348283.2348413.

Predicting Quality Flaws in User-Generated Content: The Case of Wikipedia

Contents

Overview

Embed

Wikipedia Quality

English Wikipedia

HTML

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools