Difference between revisions of "Filling the Gaps: Improving Wikipedia Stubs"

From Wikipedia Quality
Jump to: navigation, search
(Creating a page: Filling the Gaps: Improving Wikipedia Stubs)
 
(Int.links)
Line 1: Line 1:
'''Filling the Gaps: Improving Wikipedia Stubs''' - scientific work related to Wikipedia quality published in 2015, written by Siddhartha Banerjee and Prasenjit Mitra.
+
'''Filling the Gaps: Improving Wikipedia Stubs''' - scientific work related to [[Wikipedia quality]] published in 2015, written by [[Siddhartha Banerjee]] and [[Prasenjit Mitra]].
  
 
== Overview ==
 
== Overview ==
The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, goal is to automate the process of generation of content for Wikipedia. In this work, authors propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. Authors conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Authors experiments reveal that the LDA based model outperforms the other models (~6% F-score). Authors generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by system outperform the articles generated using the baseline. Content generated by system has been appended to several stubs and successfully retained in Wikipedia.
+
The availability of only a limited number of contributors on [[Wikipedia]] cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, goal is to automate the process of generation of content for Wikipedia. In this work, authors propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns [[features]] from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the [[completeness]] of such stubs. Authors conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Authors experiments reveal that the LDA based model outperforms the other models (~6% F-score). Authors generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by system outperform the articles generated using the baseline. Content generated by system has been appended to several stubs and successfully retained in Wikipedia.

Revision as of 07:22, 22 June 2019

Filling the Gaps: Improving Wikipedia Stubs - scientific work related to Wikipedia quality published in 2015, written by Siddhartha Banerjee and Prasenjit Mitra.

Overview

The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, goal is to automate the process of generation of content for Wikipedia. In this work, authors propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. Authors conduct experiments using several classifiers - Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Authors experiments reveal that the LDA based model outperforms the other models (~6% F-score). Authors generation approach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by system outperform the articles generated using the baseline. Content generated by system has been appended to several stubs and successfully retained in Wikipedia.