Difference between revisions of "Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models"

Latest revision as of 13:11, 12 November 2020

Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models
Authors	Si-Chi Chin W. Nick Street Padmini Srinivasan David Eichmann
Publication date	2010
DOI	10.1145/1772938.1772942
Links	Original

Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models - scientific work related to Wikipedia quality published in 2010, written by Si-Chi Chin, W. Nick Street, Padmini Srinivasan and David Eichmann.

Overview

This paper proposes an active learning approach using language model statistics to detect Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection.

Embed

Wikipedia Quality

Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). "[[Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models]]".DOI: 10.1145/1772938.1772942.

English Wikipedia

{{cite journal |last1=Chin |first1=Si-Chi |last2=Street |first2=W. Nick |last3=Srinivasan |first3=Padmini |last4=Eichmann |first4=David |title=Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models |date=2010 |doi=10.1145/1772938.1772942 |url=https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models}}

HTML

Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). "<a href="https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models">Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models</a>".DOI: 10.1145/1772938.1772942.

@@ Line 10: / Line 10: @@
 == Overview ==
 This paper proposes an active learning approach using language model statistics to detect [[Wikipedia]] vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection.
+== Embed ==
+=== Wikipedia Quality ===
+<code>
+<nowiki>
+Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). "[[Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models]]".DOI: 10.1145/1772938.1772942.
+</nowiki>
+</code>
+=== English Wikipedia ===
+<code>
+<nowiki>
+{{cite journal |last1=Chin |first1=Si-Chi |last2=Street |first2=W. Nick |last3=Srinivasan |first3=Padmini |last4=Eichmann |first4=David |title=Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models |date=2010 |doi=10.1145/1772938.1772942 |url=https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models}}
+</nowiki>
+</code>
+=== HTML ===
+<code>
+<nowiki>
+Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). &amp;quot;<a href="https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models">Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models</a>&amp;quot;.DOI: 10.1145/1772938.1772942.
+</nowiki>
+</code>
+[[Category:Scientific works]]

Difference between revisions of "Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models"

Latest revision as of 13:11, 12 November 2020

Contents

Overview

Embed

Wikipedia Quality

English Wikipedia

HTML

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools