Difference between revisions of "An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article"

From Wikipedia Quality
Jump to: navigation, search
(An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article - basic info)
 
(+ wikilinks)
 
Line 1: Line 1:
'''An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article''' - scientific work related to Wikipedia quality published in 2015, written by Hanif Bhuiyan, Kyeong-Jin Oh, Myung-Duk Hong and Geun-Sik Jo.
+
'''An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article''' - scientific work related to [[Wikipedia quality]] published in 2015, written by [[Hanif Bhuiyan]], [[Kyeong-Jin Oh]], [[Myung-Duk Hong]] and [[Geun-Sik Jo]].
  
 
== Overview ==
 
== Overview ==
Wikipedia infoboxes serve as important structured information source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the Wikipedia contributors. In this paper, authors present a Natural Language Processing (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word features of Wikipedia articles. Authors approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.
+
Wikipedia [[infoboxes]] serve as important [[structured information]] source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the [[Wikipedia]] contributors. In this paper, authors present a [[Natural Language Processing]] (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word [[features]] of Wikipedia articles. Authors approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.

Latest revision as of 10:11, 12 August 2019

An Unsupervised Approach for Identifying the Infobox Template of Wikipedia Article - scientific work related to Wikipedia quality published in 2015, written by Hanif Bhuiyan, Kyeong-Jin Oh, Myung-Duk Hong and Geun-Sik Jo.

Overview

Wikipedia infoboxes serve as important structured information source in the web. To author infobox for a particular article, volunteers required a considerable amount of manual effort to identify the respective infobox template. Thus, an automatic process to mark infobox template might be useful and beneficial for the Wikipedia contributors. In this paper, authors present a Natural Language Processing (NLP)-based automated approach to identify the infobox template in an unsupervised fashion. The proposed approach has been developed by using semantic relations (hyponym and holonym) and word features of Wikipedia articles. Authors approach works in three steps: first it processes the raw text of the article to generate sets of words, next it apply the proposed algorithm to identify the infobox type and finally point out the infobox template from the large pool of template list. The effectiveness of the proposed approach has been proved in terms of autonomous and accuracy, by a data-driven experiment.