Wikipedia-Assisted Concept Thesaurus for Better Web Media Understanding

From Wikipedia Quality
Revision as of 08:31, 5 August 2019 by Brianna (talk | contribs) (Adding infobox)
Jump to: navigation, search


Wikipedia-Assisted Concept Thesaurus for Better Web Media Understanding
Authors
Huan Wang
Liang-Tien Chia
Shenghua Gao
Publication date
2010
DOI
10.1145/1743384.1743441
Links
Original

Wikipedia-Assisted Concept Thesaurus for Better Web Media Understanding - scientific work related to Wikipedia quality published in 2010, written by Huan Wang, Liang-Tien Chia and Shenghua Gao.

Overview

Concept ontology has been used in the area of artificial intelligence, biomedical informatics and library science and it has been shown as an effective approach to better understand data in the respective domains. One main difficulty that hedge against the development of ontology approaches is the extra work required in ontology construction and annotation. With the emergent lexical dictionaries and encyclopedias such as WordNet, Wikipedia, innovations from different directions have been proposed to automatically extract concept ontologies. Unfortunately, many of the proposed ontologies are not fully exploited according to the general human knowledge. Authors study the various knowledge sources and aim to build a construct scalable concept thesaurus suitable for better understanding of media in the World Wide Web from Wikipedia. With its wide concept coverage, finely organized categories, diverse concept relations, and up-to-date information, the collaborative encyclopedia Wikipedia has almost all the requisite attributes to contribute to a well-defined concept ontology. Besides the explicit concept relations such as disambiguation, synonymy, Wikipedia also provides implicit concept relations through cross-references between articles. In previous work, authors have built ontology with explicit relations from Wikipedia page contents. Even though the method works, mining explicit semantic relations from every Wikipedia concept page content has unsolved scalable issue when more concepts are involved. This paper describes attempt to automatically build a concept thesaurus, which encodes both explicit and implicit semantic relations for a large-scale of concepts from Wikipedia. Authors proposed thesaurus construction takes advantage of both structure and content features of the downloaded Wikipedia database, and defines concept entries with its related concepts and relations. This thesaurus is further used to exploit semantics from web page context to build a more semantic meaningful space. Authors move a step forward to combine the similarity distance from the image feature space to boost the performance. Authors evaluate approach through application of the constructed concept thesaurus to web image retrieval. The results show that it is possible to use implicit semantic relations to improve the retrieval performance.