Collective Annotation of Wikipedia Entities in Web Text

From Wikipedia Quality
Revision as of 21:29, 1 June 2019 by Sofia (talk | contribs) (Overview - Collective Annotation of Wikipedia Entities in Web Text)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Collective Annotation of Wikipedia Entities in Web Text - scientific work related to Wikipedia quality published in 2009, written by Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan and Soumen Chakrabarti.

Overview

To take the first step beyond keyword-based search toward entity-based search, suitable token spans ("spots") on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text around the spot and textual metadata associated with the entity. Two recent systems exploit inter-label dependencies, but in limited ways. Authors propose a general collective disambiguation approach. Authors premise is that coherent documents refer to entities from one or a few related topics or domains. Authors give formulations for the trade-off between local spot-to-entity compatibility and measures of global coherence between entities. Optimizing the overall entity assignment is NP-hard. Authors investigate practical solutions based on local hill-climbing, rounding integer linear programs, and pre-clustering entities followed by local optimization within clusters. In experiments involving over a hundred manually-annotated Web pages and tens of thousands of spots, approaches significantly outperform recently-proposed algorithms.