Difference between revisions of "Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages"

From Wikipedia Quality
Jump to: navigation, search
(Adding wikilinks)
(+ Infobox work)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages
 +
| date = 2015
 +
| authors = [[Khoi-Nguyen Tran]]<br />[[Peter Christen]]<br />[[Scott Sanner]]<br />[[Lexing Xie]]
 +
| doi = 10.1007/978-3-319-18038-0_30
 +
| link = https://link.springer.com/chapter/10.1007/978-3-319-18038-0_30
 +
}}
 
'''Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages''' - scientific work related to [[Wikipedia quality]] published in 2015, written by [[Khoi-Nguyen Tran]], [[Peter Christen]], [[Scott Sanner]] and [[Lexing Xie]].
 
'''Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages''' - scientific work related to [[Wikipedia quality]] published in 2015, written by [[Khoi-Nguyen Tran]], [[Peter Christen]], [[Scott Sanner]] and [[Lexing Xie]].
  
 
== Overview ==
 
== Overview ==
 
The malicious modification of articles, termed vandalism, is a serious problem for open access encyclopedias such as [[Wikipedia]]. Wikipedia’s counter-vandalism bots and past vandalism detection research have greatly reduced the exposure and damage of common and obvious types of vandalism. However, there remains increasingly more sneaky types of vandalism that are clearly out of context of the sentence or article. In this paper, authors propose a novel context-aware and cross-language vandalism detection technique that scales to the size of the full Wikipedia and extends the types of vandalism detectable beyond past feature-based approaches. Authors technique uses word dependencies to identify vandal words in sentences by combining part-of-speech tagging with a conditional random fields classifier. Authors evaluate technique on two Wikipedia data sets: the PAN data sets with over 62,000 edits, commonly used by related research; and own vandalism repairs data sets with over 500 million edits of over 9 million articles from five languages. As a comparison, authors implement a feature-based classifier to analyse the quality of each classification technique and the trade-offs of each type of classifier. Authors results show how context-aware detection techniques can become a new counter-vandalism tool for Wikipedia that complements current feature-based techniques.
 
The malicious modification of articles, termed vandalism, is a serious problem for open access encyclopedias such as [[Wikipedia]]. Wikipedia’s counter-vandalism bots and past vandalism detection research have greatly reduced the exposure and damage of common and obvious types of vandalism. However, there remains increasingly more sneaky types of vandalism that are clearly out of context of the sentence or article. In this paper, authors propose a novel context-aware and cross-language vandalism detection technique that scales to the size of the full Wikipedia and extends the types of vandalism detectable beyond past feature-based approaches. Authors technique uses word dependencies to identify vandal words in sentences by combining part-of-speech tagging with a conditional random fields classifier. Authors evaluate technique on two Wikipedia data sets: the PAN data sets with over 62,000 edits, commonly used by related research; and own vandalism repairs data sets with over 500 million edits of over 9 million articles from five languages. As a comparison, authors implement a feature-based classifier to analyse the quality of each classification technique and the trade-offs of each type of classifier. Authors results show how context-aware detection techniques can become a new counter-vandalism tool for Wikipedia that complements current feature-based techniques.

Revision as of 16:18, 21 June 2020


Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages
Authors
Khoi-Nguyen Tran
Peter Christen
Scott Sanner
Lexing Xie
Publication date
2015
DOI
10.1007/978-3-319-18038-0_30
Links
Original

Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages - scientific work related to Wikipedia quality published in 2015, written by Khoi-Nguyen Tran, Peter Christen, Scott Sanner and Lexing Xie.

Overview

The malicious modification of articles, termed vandalism, is a serious problem for open access encyclopedias such as Wikipedia. Wikipedia’s counter-vandalism bots and past vandalism detection research have greatly reduced the exposure and damage of common and obvious types of vandalism. However, there remains increasingly more sneaky types of vandalism that are clearly out of context of the sentence or article. In this paper, authors propose a novel context-aware and cross-language vandalism detection technique that scales to the size of the full Wikipedia and extends the types of vandalism detectable beyond past feature-based approaches. Authors technique uses word dependencies to identify vandal words in sentences by combining part-of-speech tagging with a conditional random fields classifier. Authors evaluate technique on two Wikipedia data sets: the PAN data sets with over 62,000 edits, commonly used by related research; and own vandalism repairs data sets with over 500 million edits of over 9 million articles from five languages. As a comparison, authors implement a feature-based classifier to analyse the quality of each classification technique and the trade-offs of each type of classifier. Authors results show how context-aware detection techniques can become a new counter-vandalism tool for Wikipedia that complements current feature-based techniques.