Difference between revisions of "Snooping Wikipedia Vandals with Mapreduce"

From Wikipedia Quality
Jump to: navigation, search
(Creating a new page - Snooping Wikipedia Vandals with Mapreduce)
 
(+ wikilinks)
Line 1: Line 1:
'''Snooping Wikipedia Vandals with Mapreduce''' - scientific work related to Wikipedia quality published in 2015, written by Michele Spina, Dario Rossi, Mauro Sozio, Silviu Maniu and Bogdan Cautis.
+
'''Snooping Wikipedia Vandals with Mapreduce''' - scientific work related to [[Wikipedia quality]] published in 2015, written by [[Michele Spina]], [[Dario Rossi]], [[Mauro Sozio]], [[Silviu Maniu]] and [[Bogdan Cautis]].
  
 
== Overview ==
 
== Overview ==
In this paper, authors present and validate an algorithm able to accurately identify anomalous behaviors on online and collaborative social networks, based on their interaction with other fellows. Authors focus on Wikipedia, where accurate ground truth for the classification of vandals can be reliably gathered by manual inspection of the page edit history. Authors develop a distributed crawler and classifier tasks, both implemented in MapReduce, with whom authors are able to explore a very large dataset, consisting of over 5 millions articles collaboratively edited by 14 millions authors, resulting in over 8 billion pairwise interactions. Authors represent Wikipedia as a signed network, where positive arcs imply constructive interaction between editors. Authors then isolate a set of high reputation editors (i.e., nodes having many positive incoming links) and classify the remaining ones based on their interactions with high reputation editors. Authors demonstrate approach not only to be practically relevant (due to the size of dataset), but also feasible (as it requires few MapReduce iteration) and accurate (over 95% true positive rate). At the same time, authors are able to classify only about half of the dataset editors (recall of 50%) for which authors outline some solution under study.
+
In this paper, authors present and validate an algorithm able to accurately identify anomalous behaviors on online and collaborative [[social network]]s, based on their interaction with other fellows. Authors focus on [[Wikipedia]], where accurate ground truth for the classification of vandals can be reliably gathered by manual inspection of the page edit history. Authors develop a distributed crawler and classifier tasks, both implemented in MapReduce, with whom authors are able to explore a very large dataset, consisting of over 5 millions articles collaboratively edited by 14 millions authors, resulting in over 8 billion pairwise interactions. Authors represent Wikipedia as a signed network, where positive arcs imply constructive interaction between editors. Authors then isolate a set of high [[reputation]] editors (i.e., nodes having many positive incoming links) and classify the remaining ones based on their interactions with high reputation editors. Authors demonstrate approach not only to be practically relevant (due to the size of dataset), but also feasible (as it requires few MapReduce iteration) and accurate (over 95% true positive rate). At the same time, authors are able to classify only about half of the dataset editors (recall of 50%) for which authors outline some solution under study.

Revision as of 23:19, 26 September 2019

Snooping Wikipedia Vandals with Mapreduce - scientific work related to Wikipedia quality published in 2015, written by Michele Spina, Dario Rossi, Mauro Sozio, Silviu Maniu and Bogdan Cautis.

Overview

In this paper, authors present and validate an algorithm able to accurately identify anomalous behaviors on online and collaborative social networks, based on their interaction with other fellows. Authors focus on Wikipedia, where accurate ground truth for the classification of vandals can be reliably gathered by manual inspection of the page edit history. Authors develop a distributed crawler and classifier tasks, both implemented in MapReduce, with whom authors are able to explore a very large dataset, consisting of over 5 millions articles collaboratively edited by 14 millions authors, resulting in over 8 billion pairwise interactions. Authors represent Wikipedia as a signed network, where positive arcs imply constructive interaction between editors. Authors then isolate a set of high reputation editors (i.e., nodes having many positive incoming links) and classify the remaining ones based on their interactions with high reputation editors. Authors demonstrate approach not only to be practically relevant (due to the size of dataset), but also feasible (as it requires few MapReduce iteration) and accurate (over 95% true positive rate). At the same time, authors are able to classify only about half of the dataset editors (recall of 50%) for which authors outline some solution under study.