Detecting Vandalism on English Wikipedia Using Lnsmote Resampling and Cascaded Random Forest Classifier

From Wikipedia Quality
Revision as of 10:19, 30 August 2020 by Autumn (talk | contribs) (+ categories)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Detecting Vandalism on English Wikipedia Using Lnsmote Resampling and Cascaded Random Forest Classifier
Authors
Muhammad Shulhan
Dwi H. Widyantoro
Publication date
2016
DOI
10.1109/ICAICTA.2016.7803106
Links
Original

Detecting Vandalism on English Wikipedia Using Lnsmote Resampling and Cascaded Random Forest Classifier - scientific work related to Wikipedia quality published in 2016, written by Muhammad Shulhan and Dwi H. Widyantoro.

Overview

Wikipedia.org is an online encyclopedia which can be edited by anyone. This feature makes the article in Wikipedia rapidly increased in size and can be fixed subsequently, but also makes it prone to vandalism in the forms of invalid information, deletion, ads, or meaningless content. This paper propose a framework for detecting vandalism on English Wikipedia using machine learning technique by training Cascaded Random Forest (CRF) classifier on PAN Wikipedia Vandalism Corpus 2010 (PAN-WVC-10) English dataset that has been resampled using Local Neighbourhood Synthetic Minority Oversampling Technique (LNSMOTE). These two techniques then compared with Random Forest (RF) for classifier and Synthetic Minority Oversampling Technique (SMOTE) for resampling. The result of classifiers that has been tested on PAN Wikipedia Vandalism Corpus 2011 (PAN-WVC-11) English dataset showed that dataset resampled using LNSMOTE increase the true-positive rate (TPR) better than SMOTE in both classifiers. CRF on SMOTE with 200 stages and 1 tree gave the better result among others with TPR value 0.9904. From training computation time, CRF 1.6 times faster than RF in resampled dataset.

Embed

Wikipedia Quality

Shulhan, Muhammad; Widyantoro, Dwi H.. (2016). "[[Detecting Vandalism on English Wikipedia Using Lnsmote Resampling and Cascaded Random Forest Classifier]]".DOI: 10.1109/ICAICTA.2016.7803106.

English Wikipedia

{{cite journal |last1=Shulhan |first1=Muhammad |last2=Widyantoro |first2=Dwi H. |title=Detecting Vandalism on English Wikipedia Using Lnsmote Resampling and Cascaded Random Forest Classifier |date=2016 |doi=10.1109/ICAICTA.2016.7803106 |url=https://wikipediaquality.com/wiki/Detecting_Vandalism_on_English_Wikipedia_Using_Lnsmote_Resampling_and_Cascaded_Random_Forest_Classifier}}

HTML

Shulhan, Muhammad; Widyantoro, Dwi H.. (2016). &quot;<a href="https://wikipediaquality.com/wiki/Detecting_Vandalism_on_English_Wikipedia_Using_Lnsmote_Resampling_and_Cascaded_Random_Forest_Classifier">Detecting Vandalism on English Wikipedia Using Lnsmote Resampling and Cascaded Random Forest Classifier</a>&quot;.DOI: 10.1109/ICAICTA.2016.7803106.