Wikiautocat: Information Retrieval System for Automatic Categorization of Wikipedia Articles

From Wikipedia Quality
Revision as of 20:37, 17 July 2019 by Paisley (talk | contribs) (Infobox)
Jump to: navigation, search


Wikiautocat: Information Retrieval System for Automatic Categorization of Wikipedia Articles
Authors
Nesma Refaei
Elsayed E. Hemayed
Riham Mansour
Publication date
2018
DOI
10.1007/s13369-018-3244-9
Links
Original

Wikiautocat: Information Retrieval System for Automatic Categorization of Wikipedia Articles - scientific work related to Wikipedia quality published in 2018, written by Nesma Refaei, Elsayed E. Hemayed and Riham Mansour.

Overview

Document categorization became a crucial task to organize the massive amount of data over the web. Moreover, many web repositories tended to classify its articles to hierarchies of topics. This structure facilitates connecting related topics and reaching articles. Wikipedia has organized its articles in a category hierarchy; but so far, the categorization process is done manually by human editors which is a confusing, tiring and a time-consuming task. In this work authors propose WikiAutoCat system for automatic categorization of Wikipedia articles. It is an information retrieval system that suggests the most relevant set of categories to the article editor to simplify the categorization process. Empirical evaluation demonstrates that system is scalable enough to perform the categorization process of such a big dataset and it achieves big improvements over the state of the art in Wikipedia categorization in accuracy by 41.65% over WikiCat-Word system and 26.83% over WikiCat-Link system. Also, it is evaluated on a benchmark dataset and achieved gains over their baseline by 8.1% in accuracy.