Correlation Between Similarity Measures for Inter-Language Linked Wikipedia Articles

From Wikipedia Quality
Jump to: navigation, search


Correlation Between Similarity Measures for Inter-Language Linked Wikipedia Articles
Authors
Monica Lestari Paramita
Paul D. Clough
Ahmet Aker
Robert J. Gaizauskas
Publication date
2012
Links
Original

Correlation Between Similarity Measures for Inter-Language Linked Wikipedia Articles - scientific work related to Wikipedia quality published in 2012, written by Monica Lestari Paramita, Paul D. Clough, Ahmet Aker and Robert J. Gaizauskas.

Overview

Wikipedia articles in different languages have been mined to support various tasks, such as Cross-Language Information Retrieval (CLIR) and Statistical Machine Translation (SMT). Articles on the same topic in different languages are often connected by inter-language links, which can be used to identify similar or comparable content. In this work, authors investigate the correlation between similarity measures utilising language-independent and language-dependent features and respective human judgments. A collection of 800 Wikipedia pairs from 8 different language pairs were collected and judged for similarity by two assessors. Authors report the development of this corpus and inter-assessor agreement between judges across the languages. Results show that similarity measured using language independent features is comparable to using an approach based on translating non-English documents. In both cases the correlation with human judgments is low but also dependent upon the language pair. The results and corpus generated from this work also provide insights into the measurement of cross-language similarity.

Embed

Wikipedia Quality

Paramita, Monica Lestari; Clough, Paul D.; Aker, Ahmet; Gaizauskas, Robert J.. (2012). "[[Correlation Between Similarity Measures for Inter-Language Linked Wikipedia Articles]]". European Language Resources Association.

English Wikipedia

{{cite journal |last1=Paramita |first1=Monica Lestari |last2=Clough |first2=Paul D. |last3=Aker |first3=Ahmet |last4=Gaizauskas |first4=Robert J. |title=Correlation Between Similarity Measures for Inter-Language Linked Wikipedia Articles |date=2012 |url=https://wikipediaquality.com/wiki/Correlation_Between_Similarity_Measures_for_Inter-Language_Linked_Wikipedia_Articles |journal=European Language Resources Association}}

HTML

Paramita, Monica Lestari; Clough, Paul D.; Aker, Ahmet; Gaizauskas, Robert J.. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Correlation_Between_Similarity_Measures_for_Inter-Language_Linked_Wikipedia_Articles">Correlation Between Similarity Measures for Inter-Language Linked Wikipedia Articles</a>&quot;. European Language Resources Association.