Vandalism Detection in Wikipedia: a High-Performing, Feature-Rich Model and Its Reduction Through Lasso

From Wikipedia Quality
Revision as of 10:00, 14 December 2019 by Zoe (talk | contribs) (+ Embed)
Jump to: navigation, search


Vandalism Detection in Wikipedia: a High-Performing, Feature-Rich Model and Its Reduction Through Lasso
Authors
Sara Javanmardi
David W. McDonald
Cristina Videira Lopes
Publication date
2011
DOI
10.1145/2038558.2038573
Links
Original

Vandalism Detection in Wikipedia: a High-Performing, Feature-Rich Model and Its Reduction Through Lasso - scientific work related to Wikipedia quality published in 2011, written by Sara Javanmardi, David W. McDonald and Cristina Videira Lopes.

Overview

User generated content (UGC) constitutes a significant fraction of the Web. However, some wiiki-based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient online algorithms for better tools to assist users in vandalism detection. Authors describe an efficient and accurate classifier that performs vandalism detection in UGC sites. Authors show the results of classifier in the PAN Wikipedia dataset. Authors explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset -- the best result to knowledge. Using Lasso optimization authors then reduce feature--rich model to a much smaller and more efficient model of 28 features that performs almost as well -- the drop in AUC being only 0.005. Authors describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism.

Embed

Wikipedia Quality

Javanmardi, Sara; McDonald, David W.; Lopes, Cristina Videira. (2011). "[[Vandalism Detection in Wikipedia: a High-Performing, Feature-Rich Model and Its Reduction Through Lasso]]".DOI: 10.1145/2038558.2038573.

English Wikipedia

{{cite journal |last1=Javanmardi |first1=Sara |last2=McDonald |first2=David W. |last3=Lopes |first3=Cristina Videira |title=Vandalism Detection in Wikipedia: a High-Performing, Feature-Rich Model and Its Reduction Through Lasso |date=2011 |doi=10.1145/2038558.2038573 |url=https://wikipediaquality.com/wiki/Vandalism_Detection_in_Wikipedia:_a_High-Performing,_Feature-Rich_Model_and_Its_Reduction_Through_Lasso}}

HTML

Javanmardi, Sara; McDonald, David W.; Lopes, Cristina Videira. (2011). &quot;<a href="https://wikipediaquality.com/wiki/Vandalism_Detection_in_Wikipedia:_a_High-Performing,_Feature-Rich_Model_and_Its_Reduction_Through_Lasso">Vandalism Detection in Wikipedia: a High-Performing, Feature-Rich Model and Its Reduction Through Lasso</a>&quot;.DOI: 10.1145/2038558.2038573.