Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation

From Wikipedia Quality
Jump to: navigation, search
Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation
Authors
Gorka Labaka
Iñaki Alegría
Kepa Sarasola
Publication date
2016
ISBN
978-295174089-1
Links

Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation - scientific work about Wikipedia quality published in 2016, written by Gorka Labaka, Iñaki Alegría and Kepa Sarasola.

Overview

This paper presents how a state-of-the-art Statistical Machine Translation system is enriched by using extra in-domain parallel corpora extracted from Wikipedia. Authors collect corpora from parallel titles and from parallel fragments in comparable articles from Wikipedia editions for English, Spanish and Basque. Authors carried out an evaluation with a double objective: to evaluate the quality of the extracted data and to evaluate the improvement from using domain-adaptation. Authors think this enrichment method can be very useful for languages with limited amount of parallel corpora, where in-domain data is crucial to improve the performance of MT systems. The experiments on the Spanish-English language pair improve a baseline trained on the Europarl corpus in more than 2 BLEU points when translating texts from the Computer Science domain.

Embed

Wikipedia Quality

Labaka, Gorka; Alegría, Iñaki; Sarasola, Kepa. (2016). "[[Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation]]". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9982 LNCS, 2016, pp. 177-185. ISBN: 978-295174089-1.

English Wikipedia

{{cite journal |last1=Labaka |first1=Gorka |last2=Alegría |first2=Iñaki |last3=Sarasola |first3=Kepa |title=Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation |date=2016 |isbn=978-295174089-1 |url=https://wikipediaquality.com/wiki/Domain_Adaptation_in_MT_Using_Wikipedia_as_a_Parallel_Corpus:_Resources_and_Evaluation |journal=Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9982 LNCS, 2016, pp. 177-185}}

HTML

Labaka, Gorka; Alegría, Iñaki; Sarasola, Kepa. (2016). &quot;<a href="https://wikipediaquality.com/wiki/Domain_Adaptation_in_MT_Using_Wikipedia_as_a_Parallel_Corpus:_Resources_and_Evaluation">Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation</a>&quot;. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9982 LNCS, 2016, pp. 177-185. ISBN: 978-295174089-1.