Difference between revisions of "The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia"

Revision as of 08:39, 19 November 2019

The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Authors	Oliver Ferschke Iryna Gurevych Marc Rittberger
Publication date	2013
Links	Original

The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia - scientific work related to Wikipedia quality published in 2013, written by Oliver Ferschke, Iryna Gurevych and Marc Rittberger.

Overview

With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. Authors show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. Authors argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. Authors factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.

@@ Line 1: / Line 1: @@
+{{Infobox work
+| title = The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
+| date = 2013
+| authors = [[Oliver Ferschke]]<br />[[Iryna Gurevych]]<br />[[Marc Rittberger]]
+| link = http://www.aclweb.org/anthology/P13-1071
+}}
 '''The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Oliver Ferschke]], [[Iryna Gurevych]] and [[Marc Rittberger]].
 == Overview ==
 With the increasing amount of user generated reference texts in the web, automatic [[quality assessment]] has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. [[Wikipedia]] contains a large amount of texts annotated with cleanup templates which identify quality flaws. Authors show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. Authors argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. Authors factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.

Difference between revisions of "The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia"

Revision as of 08:39, 19 November 2019

Overview

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools