Difference between revisions of "Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011"

From Wikipedia Quality
Jump to: navigation, search
(wikilinks)
(Infobox work)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011
 +
| date = 2011
 +
| authors = [[Cristian-Alexandru Dragusanu]]<br />[[Marina Cufliuc]]<br />[[Adrian Iftene]]
 +
| link = http://ceur-ws.org/Vol-1177/CLEF2011wn-PAN-DragusanuEt2011.pdf
 +
}}
 
'''Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Cristian-Alexandru Dragusanu]], [[Marina Cufliuc]] and [[Adrian Iftene]].
 
'''Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Cristian-Alexandru Dragusanu]], [[Marina Cufliuc]] and [[Adrian Iftene]].
  
 
== Overview ==
 
== Overview ==
 
Wikipedia vandalism identification is a very complex issue, which is now mostly solved manually by volunteers. This paper presents the main components of a system built by group in order to automatically identify vandalized [[Wikipedia]] articles. The main component of system is a machine learning component that uses three types of [[features]] grouped in 3 classes: Metadata, Text and Language. Additional to previous approaches authors consider 4 new features related to vulgar, biased, sexual and miscellaneous bad words. The obtained results showed an area of 0.42464 under the PR-AUC curve and an area of 0.82963 under the ROC-AUC curve.
 
Wikipedia vandalism identification is a very complex issue, which is now mostly solved manually by volunteers. This paper presents the main components of a system built by group in order to automatically identify vandalized [[Wikipedia]] articles. The main component of system is a machine learning component that uses three types of [[features]] grouped in 3 classes: Metadata, Text and Language. Additional to previous approaches authors consider 4 new features related to vulgar, biased, sexual and miscellaneous bad words. The obtained results showed an area of 0.42464 under the PR-AUC curve and an area of 0.82963 under the ROC-AUC curve.

Revision as of 11:02, 25 January 2020


Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011
Authors
Cristian-Alexandru Dragusanu
Marina Cufliuc
Adrian Iftene
Publication date
2011
Links
Original

Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011 - scientific work related to Wikipedia quality published in 2011, written by Cristian-Alexandru Dragusanu, Marina Cufliuc and Adrian Iftene.

Overview

Wikipedia vandalism identification is a very complex issue, which is now mostly solved manually by volunteers. This paper presents the main components of a system built by group in order to automatically identify vandalized Wikipedia articles. The main component of system is a machine learning component that uses three types of features grouped in 3 classes: Metadata, Text and Language. Additional to previous approaches authors consider 4 new features related to vulgar, biased, sexual and miscellaneous bad words. The obtained results showed an area of 0.42464 under the PR-AUC curve and an area of 0.82963 under the ROC-AUC curve.