Difference between revisions of "Learning to Detect Vandalism in Social Content Systems: a Study on Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Creating a new page - Learning to Detect Vandalism in Social Content Systems: a Study on Wikipedia)
 
(Int.links)
Line 1: Line 1:
'''Learning to Detect Vandalism in Social Content Systems: a Study on Wikipedia''' - scientific work related to Wikipedia quality published in 2013, written by Sara Javanmardi, David W. McDonald, Rich Caruana, Sholeh Forouzan and Cristina Videira Lopes.
+
'''Learning to Detect Vandalism in Social Content Systems: a Study on Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Sara Javanmardi]], [[David W. McDonald]], [[Rich Caruana]], [[Sholeh Forouzan]] and [[Cristina Videira Lopes]].
  
 
== Overview ==
 
== Overview ==
A challenge facing user generated content systems is vandalism, i.e. edits that damage content quality. The high visibility and easy access to social networks makes them popular targets for vandals. Detecting and removing vandalism is critical for these user generated content systems. Because vandalism can take many forms, there are many different kinds of features that are potentially useful for detecting it. The complex nature of vandalism, and the large number of potential features, make vandalism detection difficult and time consuming for human editors. Machine learning techniques hold promise for developing accurate, tunable, and maintainable models that can be incorporated into vandalism detection tools. Authors describe a method for training classifiers for vandalism detection that yields classifiers that are more accurate on the PAN 2010 corpus than others previously developed. Because of the high turnaround in social network systems, it is important for vandalism detection tools to run in real-time. To this aim, authors use feature selection to find the minimal set of features consistent with high accuracy. In addition, because some features are more costly to compute than others, authors use cost-sensitive feature selection to reduce the total computational cost of executing models. In addition to the features previously used for spam detection, authors introduce new features based on user action histories. The user history features contribute significantly to classifier performance. The approach authors use is general and can easily be applied to other user generated content systems.
+
A challenge facing user generated content systems is vandalism, i.e. edits that damage content quality. The high visibility and easy access to [[social network]]s makes them popular targets for vandals. Detecting and removing vandalism is critical for these user generated content systems. Because vandalism can take many forms, there are many different kinds of [[features]] that are potentially useful for detecting it. The complex nature of vandalism, and the large number of potential features, make vandalism detection difficult and time consuming for human editors. Machine learning techniques hold promise for developing accurate, tunable, and maintainable models that can be incorporated into vandalism detection tools. Authors describe a method for training classifiers for vandalism detection that yields classifiers that are more accurate on the PAN 2010 corpus than others previously developed. Because of the high turnaround in social network systems, it is important for vandalism detection tools to run in real-time. To this aim, authors use feature selection to find the minimal set of features consistent with high accuracy. In addition, because some features are more costly to compute than others, authors use cost-sensitive feature selection to reduce the total computational cost of executing models. In addition to the features previously used for spam detection, authors introduce new features based on user action histories. The user history features contribute significantly to classifier performance. The approach authors use is general and can easily be applied to other user generated content systems.

Revision as of 22:28, 31 July 2019

Learning to Detect Vandalism in Social Content Systems: a Study on Wikipedia - scientific work related to Wikipedia quality published in 2013, written by Sara Javanmardi, David W. McDonald, Rich Caruana, Sholeh Forouzan and Cristina Videira Lopes.

Overview

A challenge facing user generated content systems is vandalism, i.e. edits that damage content quality. The high visibility and easy access to social networks makes them popular targets for vandals. Detecting and removing vandalism is critical for these user generated content systems. Because vandalism can take many forms, there are many different kinds of features that are potentially useful for detecting it. The complex nature of vandalism, and the large number of potential features, make vandalism detection difficult and time consuming for human editors. Machine learning techniques hold promise for developing accurate, tunable, and maintainable models that can be incorporated into vandalism detection tools. Authors describe a method for training classifiers for vandalism detection that yields classifiers that are more accurate on the PAN 2010 corpus than others previously developed. Because of the high turnaround in social network systems, it is important for vandalism detection tools to run in real-time. To this aim, authors use feature selection to find the minimal set of features consistent with high accuracy. In addition, because some features are more costly to compute than others, authors use cost-sensitive feature selection to reduce the total computational cost of executing models. In addition to the features previously used for spam detection, authors introduce new features based on user action histories. The user history features contribute significantly to classifier performance. The approach authors use is general and can easily be applied to other user generated content systems.