Method for Building Sentence-Aligned Corpus from Wikipedia
Authors | Keiji Yasuda Eiichiro Sumita |
---|---|
Publication date | 2008 |
Links | Original |
Method for Building Sentence-Aligned Corpus from Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Keiji Yasuda and Eiichiro Sumita.
Overview
Authors propose the framework of a Machine Translation (MT) bootstrapping method by using multilingual Wikipedia articles. This novel method can simultaneously generate a statistical machine translation (SMT) and a sentence-aligned corpus. In this study, authors perform two types of experiments. The aim of the first type of experiments is to verify the sentence alignment performance by comparing the proposed method with a conventional sentence alignment approach. For the first type of experiments, authors use JENAAD, which is a sentence-aligned corpus built by the conventional sentence alignment method. The second type of experiments uses actual English and Japanese Wikipedia articles for sentence alignment. The result of the first type of experiments shows that the performance of the proposed method is comparable to that of the conventional sentence alignment method. Additionally, the second type of experiments shows that authors can obtain the English translation of 10% of Japanese sentences while maintaining high alignment quality (rank-A ratio of over 0.8).
Embed
Wikipedia Quality
Yasuda, Keiji; Sumita, Eiichiro. (2008). "[[Method for Building Sentence-Aligned Corpus from Wikipedia]]".
English Wikipedia
{{cite journal |last1=Yasuda |first1=Keiji |last2=Sumita |first2=Eiichiro |title=Method for Building Sentence-Aligned Corpus from Wikipedia |date=2008 |url=https://wikipediaquality.com/wiki/Method_for_Building_Sentence-Aligned_Corpus_from_Wikipedia}}
HTML
Yasuda, Keiji; Sumita, Eiichiro. (2008). "<a href="https://wikipediaquality.com/wiki/Method_for_Building_Sentence-Aligned_Corpus_from_Wikipedia">Method for Building Sentence-Aligned Corpus from Wikipedia</a>".