Building Bilingual Parallel Corpora based on Wikipedia

From Wikipedia Quality
Revision as of 11:29, 11 November 2020 by Harper (talk | contribs) (Infobox)
Jump to: navigation, search


Building Bilingual Parallel Corpora based on Wikipedia
Authors
Mehdi Mohammadi
Nasser GhasemAghaee
Publication date
2010
DOI
10.1109/ICCEA.2010.203
Links
Original

Building Bilingual Parallel Corpora based on Wikipedia - scientific work related to Wikipedia quality published in 2010, written by Mehdi Mohammadi and Nasser GhasemAghaee.

Overview

Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper authors present a Persian-­ English sentence-aligned parallel corpus by mining Wikipedia. Authors propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that method increase precision, while it reduce the total number of generated candidate pairs.