Mining Semantic Relationships Between Concepts Across Documents Using Wikipedia Knowledge

From Wikipedia Quality
Revision as of 10:27, 6 November 2019 by Mila (talk | contribs) (wikilinks)
Jump to: navigation, search

Mining Semantic Relationships Between Concepts Across Documents Using Wikipedia Knowledge - scientific work related to Wikipedia quality published in 2013, written by Wei Jin and Peng Yan.

Overview

The ongoing astounding growth of text data has created an enormous need for fast and efficient Text Mining algorithms. However, the sparsity and high dimensionality of text data present great challenges for representing the semantics of natural language text. Traditional approaches for document representation are mostly based on the Vector Space (VSM) Model which takes a document as an unordered collection of words and only document-level statistical information is recorded (e.g., document frequency, inverse document frequency). Due to the lack of capturing semantics in texts, for certain tasks, especially fine-grained information discovery applications, such as mining relationships between concepts, VSM demonstrates its inherent limitations because of its rationale for computing relatedness between words only based on the statistical information collected from documents themselves. In this dissertation, authors present a new framework that attempts to address the above problems by utilizing background knowledge to provide a better semantic representation of any text. This is accomplished through leveraging Wikipedia, the world's currently largest human built encyclopedia. Meanwhile, this integration also sufficiently complements the existing information contained in text corpus and facilitates the construction of a more comprehensive representation and retrieval framework.