Disambiguation to Wikipedia: a Language and Domain Independent Approach

From Wikipedia Quality
Revision as of 00:37, 4 September 2019 by Zoe (talk | contribs) (Overview - Disambiguation to Wikipedia: a Language and Domain Independent Approach)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Disambiguation to Wikipedia: a Language and Domain Independent Approach - scientific work related to Wikipedia quality published in 2013, written by Truc-Vien T. Nguyen.

Overview

Disambiguation to Wikipedia (D2W) is the task of linking mentions of concepts in text to their corresponding Wikipedia articles. Traditional approaches to D2W has focused either in only one language (e.g. English) or in formal texts (e.g. news articles). In this paper, authors present a multilingual framework with a set of new features that can be obtained purely from the online encyclopedia, without the need of any natural language specific tool. Authors analyze these features with different languages and different domains. The approach shows as fully language-independent and has been applied successfully to English, Italian, Polish, with a consistent improvement. Authors show that only a sufficient number of Wikipedia articles is needed for training. When trained on real-world data sets for English, new features yield substantial improvement compared to current local and global disambiguation algorithms. Finally, the adaption to the Bridgeman query logs in digital libraries shows the robustness of approach even in the lack of disambiguation context. Also, as no natural language specific tool is needed, the method can be applied to other languages in a similar manner with little adaptation.