Pairing Wikipedia Articles Across Languages

From Wikipedia Quality
Revision as of 10:38, 18 December 2019 by Isabelle (talk | contribs) (Overview - Pairing Wikipedia Articles Across Languages)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Pairing Wikipedia Articles Across Languages - scientific work related to Wikipedia quality published in 2016, written by Marcus Klang and Pierre Nugues.

Overview

Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. Authors created a dataset of all the articles on persons authors extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. Authors report a correct match of at least 94.3% on each pair. (Less)