Wikilda: Towards More Effective Knowledge Acquisition in Topic Models Using Wikipedia

From Wikipedia Quality
Revision as of 12:49, 8 May 2020 by Camila (talk | contribs) (+ links)
Jump to: navigation, search

Wikilda: Towards More Effective Knowledge Acquisition in Topic Models Using Wikipedia - scientific work related to Wikipedia quality published in 2017, written by Swapnil Hingmire, Sutanu Chakraborti, Girish Keshav Palshikar and Abhay Sodani.

Overview

Towards the goal of enhancing interpretability of Latent Dirichlet Allocation (LDA) topics, authors propose WikiLDA, an enhancement to LDA using Wikipedia concepts. In WikiLDA, initially, for each document in a corpus authors "sprinkle" (append) its most relevant Wikipedia concepts. Authors then use Generalized Polya Urn (GPU) to incorporate word-word, word-concept, and concept-concept semantic relatedness into the generative process of LDA. As the most probable concepts from inferred topics can be referred on Wikipedia, the topics are likely to become more interpretable and hence more usable in acquiring domain knowledge from humans for various text mining tasks (e.g. eliciting topic labels for text classification). Empirical results show that a projection of documents by WikiLDA in a semantically enriched and coherent topic space leads to improved performance in text classification like tasks, especially in domains where the classes are hard to separate.