Difference between revisions of "Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles"
(Int.links) |
(Infobox) |
||
Line 1: | Line 1: | ||
+ | {{Infobox work | ||
+ | | title = Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles | ||
+ | | date = 2017 | ||
+ | | authors = [[Monica Lestari Paramita]]<br />[[Paul D. Clough]]<br />[[Robert J. Gaizauskas]] | ||
+ | | doi = 10.1007/978-3-319-56608-5_59 | ||
+ | | link = https://link.springer.com/content/pdf/10.1007%2F978-3-319-56608-5_59.pdf | ||
+ | }} | ||
'''Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles''' - scientific work related to [[Wikipedia quality]] published in 2017, written by [[Monica Lestari Paramita]], [[Paul D. Clough]] and [[Robert J. Gaizauskas]]. | '''Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles''' - scientific work related to [[Wikipedia quality]] published in 2017, written by [[Monica Lestari Paramita]], [[Paul D. Clough]] and [[Robert J. Gaizauskas]]. | ||
== Overview == | == Overview == | ||
Measuring the similarity of interlanguage-linked [[Wikipedia]] articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure [[cross-lingual]] similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset authors evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of [[features]] can further improve results. | Measuring the similarity of interlanguage-linked [[Wikipedia]] articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure [[cross-lingual]] similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset authors evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of [[features]] can further improve results. |
Revision as of 09:20, 5 December 2019
Authors | Monica Lestari Paramita Paul D. Clough Robert J. Gaizauskas |
---|---|
Publication date | 2017 |
DOI | 10.1007/978-3-319-56608-5_59 |
Links | Original |
Using Section Headings to Compute Cross-Lingual Similarity of Wikipedia Articles - scientific work related to Wikipedia quality published in 2017, written by Monica Lestari Paramita, Paul D. Clough and Robert J. Gaizauskas.
Overview
Measuring the similarity of interlanguage-linked Wikipedia articles often requires the use of suitable language resources (e.g., dictionaries and MT systems) which can be problematic for languages with limited or poor translation resources. The size of Wikipedia can also present computational demands when computing similarity. This paper presents a ‘lightweight’ approach to measure cross-lingual similarity in Wikipedia using section headings rather than the entire Wikipedia article, and language resources derived from Wikipedia and Wiktionary to perform translation. Using an existing dataset authors evaluate the approach for 7 language pairs. Results show that the performance using section headings is comparable to using all article content, dictionaries derived from Wikipedia and Wiktionary are sufficient to compute cross-lingual similarity and combinations of features can further improve results.