Difference between revisions of "Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models"

From Wikipedia Quality
Jump to: navigation, search
(Infobox)
(Cats.)
 
(One intermediate revision by one other user not shown)
Line 10: Line 10:
 
== Overview ==
 
== Overview ==
 
This paper proposes an active learning approach using language model statistics to detect [[Wikipedia]] vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection.
 
This paper proposes an active learning approach using language model statistics to detect [[Wikipedia]] vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). "[[Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models]]".DOI: 10.1145/1772938.1772942.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Chin |first1=Si-Chi |last2=Street |first2=W. Nick |last3=Srinivasan |first3=Padmini |last4=Eichmann |first4=David |title=Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models |date=2010 |doi=10.1145/1772938.1772942 |url=https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). &amp;quot;<a href="https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models">Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models</a>&amp;quot;.DOI: 10.1145/1772938.1772942.
 +
</nowiki>
 +
</code>
 +
 +
 +
 +
[[Category:Scientific works]]

Latest revision as of 13:11, 12 November 2020


Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models
Authors
Si-Chi Chin
W. Nick Street
Padmini Srinivasan
David Eichmann
Publication date
2010
DOI
10.1145/1772938.1772942
Links
Original

Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models - scientific work related to Wikipedia quality published in 2010, written by Si-Chi Chin, W. Nick Street, Padmini Srinivasan and David Eichmann.

Overview

This paper proposes an active learning approach using language model statistics to detect Wikipedia vandalism. Wikipedia is a popular and influential collaborative information system. The collaborative nature of authoring, as well as the high visibility of its content, have exposed Wikipedia articles to vandalism. Vandalism is defined as malicious editing intended to compromise the integrity of the content of articles. Extensive manual efforts are being made to combat vandalism and an automated approach to alleviate the laborious process is needed. This paper builds statistical language models, constructing distributions of words from the revision history of Wikipedia articles. As vandalism often involves the use of unexpected words to draw attention, the fitness (or lack thereof) of a new edit when compared with language models built from previous versions may well indicate that an edit is a vandalism instance. In addition, the paper adopts an active learning model to solve the problem of noisy and incomplete labeling of Wikipedia vandalism. The Wikipedia domain with its revision histories offers a novel context in which to explore the potential of language models in characterizing author intention. As the experimental results presented in the paper demonstrate, these models hold promise for vandalism detection.

Embed

Wikipedia Quality

Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). "[[Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models]]".DOI: 10.1145/1772938.1772942.

English Wikipedia

{{cite journal |last1=Chin |first1=Si-Chi |last2=Street |first2=W. Nick |last3=Srinivasan |first3=Padmini |last4=Eichmann |first4=David |title=Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models |date=2010 |doi=10.1145/1772938.1772942 |url=https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models}}

HTML

Chin, Si-Chi; Street, W. Nick; Srinivasan, Padmini; Eichmann, David. (2010). &quot;<a href="https://wikipediaquality.com/wiki/Detecting_Wikipedia_Vandalism_with_Active_Learning_and_Statistical_Language_Models">Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models</a>&quot;.DOI: 10.1145/1772938.1772942.