An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article

From Wikipedia Quality
Revision as of 08:20, 29 May 2019 by Elizabeth (talk | contribs) (An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article - basic info)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article - scientific work related to Wikipedia quality published in 2015, written by Hanif Bhuiyan, Kyeong-Jin Oh, Myung-Duk Hong and Geun-Sik Jo.

Overview

Wikipedia infoboxes serve as important structured information source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the Wikipedia contributors. In this paper, authors present a Natural Language Processing (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word features of Wikipedia articles. Authors approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.