WHAD: Wikipedia Historical Attributes Data

Wikipedia Quality
WHAD: Wikipedia historical attributes data
Enrique Alfonseca
Guillermo Garrido
Jean-Yves Delort
Anselmo Peñas
Publication date
WHAD: Wikipedia historical attributes data - scientific work related to Wikipedia quality published in 2013, written by Enrique Alfonseca, Guillermo Garrido, Jean-Yves Delort and Anselmo Peñas.


This paper describes the generation of temporally anchored infobox attribute data from the Wikipedia history of revisions. By mining (attribute, value) pairs from the revision history of the English Wikipedia authors are able to collect a comprehensive knowledge base that contains data on how attributes change over time. When dealing with the Wikipedia edit history, vandalic and erroneous edits are a concern for data quality. Authors present a study of vandalism identification in Wikipedia edits that uses only features from the infoboxes, and show that authors can obtain, on this dataset, an accuracy comparable to a state-of-the-art vandalism identification method that is based on the whole article. Finally, authors discuss different characteristics of the extracted dataset, which authors make available for further study.


English Wikipedia

