Comparative Analysis of Classification Models for Quality Assessment of Wikipedia Articles

From Wikipedia Quality
Jump to: navigation, search
Comparative analysis of classification models for quality assessment of Wikipedia articles
Authors
Włodzimierz Lewoniewski
Krzysztof Węcel
Witold Abramowicz
Publication date
2017
ISBN
9788374179386
Links
Preprint

Comparative analysis of classification models for quality assessment of Wikipedia articles - scientific work about Wikipedia quality published in 2017, written by Włodzimierz Lewoniewski, Krzysztof Węcel and Witold Abramowicz

Overview

In this paper authors compare the suitability of various classification models (including CART, random forest, boosting trees, C4.5, C5.0, SVM, neural networks) for automatic assessment of the quality of articles in seven language editions of Wikipedia (Belarussian, German, English, French, Polish, Russian, Ukrainian). Authors employed models available in STATISTICA, WEKA and R Studio. For the classification task authors used over 80 different features of the articles, elaborated based on state of the art analysis and our own experience. Authors also carried out a comparative analysis regarding the significance of the parameters having an impact on the quality of the papers in each language.