Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles

From Wikipedia Quality
Revision as of 09:20, 5 December 2019 by Emma (talk | contribs) (Infobox)
Jump to: navigation, search


Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles
Authors
Monica Lestari Paramita
Paul D. Clough
Robert J. Gaizauskas
Publication date
2017
DOI
10.1007/978-3-319-56608-5_59
Links
Original

Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles - scientific work related to Wikipedia quality published in 2017, written by Monica Lestari Paramita, Paul D. Clough and Robert J. Gaizauskas.

Overview

Measuring the similarity of interlanguage-linked Wikipedia articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure cross-lingual similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset authors evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of features can further improve results.