Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011
Authors | Cristian-Alexandru Dragusanu Marina Cufliuc Adrian Iftene |
---|---|
Publication date | 2011 |
Links | Original |
Detecting Wikipedia Vandalism Using Machine Learning - Notebook for Pan at Clef 2011 - scientific work related to Wikipedia quality published in 2011, written by Cristian-Alexandru Dragusanu, Marina Cufliuc and Adrian Iftene.
Overview
Wikipedia vandalism identification is a very complex issue, which is now mostly solved manually by volunteers. This paper presents the main components of a system built by group in order to automatically identify vandalized Wikipedia articles. The main component of system is a machine learning component that uses three types of features grouped in 3 classes: Metadata, Text and Language. Additional to previous approaches authors consider 4 new features related to vulgar, biased, sexual and miscellaneous bad words. The obtained results showed an area of 0.42464 under the PR-AUC curve and an area of 0.82963 under the ROC-AUC curve.