Difference between revisions of "Building Bilingual Parallel Corpora based on Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(New work - Building Bilingual Parallel Corpora based on Wikipedia)
 
(Adding wikilinks)
Line 1: Line 1:
'''Building Bilingual Parallel Corpora based on Wikipedia''' - scientific work related to Wikipedia quality published in 2010, written by Mehdi Mohammadi and Nasser GhasemAghaee.
+
'''Building Bilingual Parallel Corpora based on Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2010, written by [[Mehdi Mohammadi]] and [[Nasser GhasemAghaee]].
  
 
== Overview ==
 
== Overview ==
Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper authors present a Persian-­ English sentence-aligned parallel corpus by mining Wikipedia. Authors propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that method increase precision, while it reduce the total number of generated candidate pairs.
+
Aligned parallel corpora are an important resource for a wide range of [[multilingual]] researches, specifically, corpus-based [[machine translation]]. In this paper authors present a Persian-­ English sentence-aligned parallel corpus by mining [[Wikipedia]]. Authors propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that method increase precision, while it reduce the total number of generated candidate pairs.

Revision as of 07:44, 3 June 2020

Building Bilingual Parallel Corpora based on Wikipedia - scientific work related to Wikipedia quality published in 2010, written by Mehdi Mohammadi and Nasser GhasemAghaee.

Overview

Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper authors present a Persian-­ English sentence-aligned parallel corpus by mining Wikipedia. Authors propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that method increase precision, while it reduce the total number of generated candidate pairs.