A Multilingual Approach to Discover Cross-Language Links in Wikipedia

From Wikipedia Quality
Revision as of 11:00, 17 June 2020 by Sofia (talk | contribs) (wikilinks)
Jump to: navigation, search

A Multilingual Approach to Discover Cross-Language Links in Wikipedia - scientific work related to Wikipedia quality published in 2015, written by Nacéra Bennacer, Mia Johnson Vioulès, Maximiliano Ariel López and Gianluca Quercini.

Overview

Wikipedia is a well-known public and collaborative encyclopaedia consisting of millions of articles. Initially in English, the popular website has grown to include versions in over 288 languages. These versions and their articles are interconnected via cross-language links, which not only facilitate navigation and understanding of concepts in multiple languages, but have been used in natural language processing applications, developments in linked open data, and expansion of minor Wikipedia language versions. These applications are the motivation for an automatic, robust, and accurate technique to identify cross-language links. In this paper, authors present a multilingual approach called EurekaCL to automatically identify missing cross-language links in Wikipedia. More precisely, given a Wikipedia article the source EurekaCL uses the multilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Authors evaluation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.