Difference between revisions of "Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles"

From Wikipedia Quality
Jump to: navigation, search
(New work - Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles)
 
(Int.links)
Line 1: Line 1:
'''Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles''' - scientific work related to Wikipedia quality published in 2017, written by Monica Lestari Paramita, Paul D. Clough and Robert J. Gaizauskas.
+
'''Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles''' - scientific work related to [[Wikipedia quality]] published in 2017, written by [[Monica Lestari Paramita]], [[Paul D. Clough]] and [[Robert J. Gaizauskas]].
  
 
== Overview ==
 
== Overview ==
Measuring the similarity of interlanguage-linked Wikipedia articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure cross-lingual similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset authors evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of features can further improve results.
+
Measuring the similarity of interlanguage-linked [[Wikipedia]] articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure [[cross-lingual]] similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset authors evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of [[features]] can further improve results.

Revision as of 09:04, 10 October 2019

Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles - scientific work related to Wikipedia quality published in 2017, written by Monica Lestari Paramita, Paul D. Clough and Robert J. Gaizauskas.

Overview

Measuring the similarity of interlanguage-linked Wikipedia articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure cross-lingual similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset authors evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of features can further improve results.