Method for Building Sentence-Aligned Corpus from Wikipedia

From Wikipedia Quality
Jump to: navigation, search


Method for Building Sentence-Aligned Corpus from Wikipedia
Authors
Keiji Yasuda
Eiichiro Sumita
Publication date
2008
Links
Original

Method for Building Sentence-Aligned Corpus from Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Keiji Yasuda and Eiichiro Sumita.

Overview

Authors propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, authors perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, authors use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that authors can obtain the English translation of 10% of Japanese sentences while maintaining high alignment quality (rank-A ratio of over 0.8).

Embed

Wikipedia Quality

Yasuda, Keiji; Sumita, Eiichiro. (2008). "[[Method for Building Sentence-Aligned Corpus from Wikipedia]]".

English Wikipedia

{{cite journal |last1=Yasuda |first1=Keiji |last2=Sumita |first2=Eiichiro |title=Method for Building Sentence-Aligned Corpus from Wikipedia |date=2008 |url=https://wikipediaquality.com/wiki/Method_for_Building_Sentence-Aligned_Corpus_from_Wikipedia}}

HTML

Yasuda, Keiji; Sumita, Eiichiro. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Method_for_Building_Sentence-Aligned_Corpus_from_Wikipedia">Method for Building Sentence-Aligned Corpus from Wikipedia</a>&quot;.