Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia

From Wikipedia Quality
Revision as of 10:28, 4 December 2019 by Sophia (talk | contribs) (New work - Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Enriching Multilingual Language Resources by Discovering Missing Cross-Language Links in Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Jong-Hoon Oh, Daisuke Kawahara, Kiyotaka Uchimoto, Jun’ichi Kazama and Kentaro Torisawa.

Overview

Authors present a novel method for discovering missing cross-language links between English and Japanese Wikipedia articles. Authors collect candidates of missing cross-language links -- a pair of English and Japanese Wikipedia articles, which could be connected by cross-language links. Then authors select the correct cross-language links among the candidates by using a classifier trained with various types of features. Authors method has three desirable characteristics for discovering missing links. First, method can discover cross-language links with high accuracy (92% precision with 78% recall rates). Second, the features used in a classifier are language-independent. Third, without relying on any external knowledge, authors generate the features based on resources automatically obtained from Wikipedia. In this work, authors discover approximately $10^5$ missing cross-language links from Wikipedia, which are almost two-thirds as many as the existing cross-language links in Wikipedia.