Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation
Authors | Gorka Labaka Iñaki Alegría Kepa Sarasola |
---|---|
Publication date | 2016 |
ISBN | 978-295174089-1 |
Links |
Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation - scientific work about Wikipedia quality published in 2016, written by Gorka Labaka, Iñaki Alegría and Kepa Sarasola.
Overview
This paper presents how a state-of-the-art Statistical Machine Translation system is enriched by using extra in-domain parallel corpora extracted from Wikipedia. Authors collect corpora from parallel titles and from parallel fragments in comparable articles from Wikipedia editions for English, Spanish and Basque. Authors carried out an evaluation with a double objective: to evaluate the quality of the extracted data and to evaluate the improvement from using domain-adaptation. Authors think this enrichment method can be very useful for languages with limited amount of parallel corpora, where in-domain data is crucial to improve the performance of MT systems. The experiments on the Spanish-English language pair improve a baseline trained on the Europarl corpus in more than 2 BLEU points when translating texts from the Computer Science domain.
Embed
Wikipedia Quality
Labaka, Gorka; Alegría, Iñaki; Sarasola, Kepa. (2016). "[[Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation]]". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9982 LNCS, 2016, pp. 177-185. ISBN: 978-295174089-1.
English Wikipedia
{{cite journal |last1=Labaka |first1=Gorka |last2=Alegría |first2=Iñaki |last3=Sarasola |first3=Kepa |title=Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation |date=2016 |isbn=978-295174089-1 |url=https://wikipediaquality.com/wiki/Domain_Adaptation_in_MT_Using_Wikipedia_as_a_Parallel_Corpus:_Resources_and_Evaluation |journal=Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9982 LNCS, 2016, pp. 177-185}}
HTML
Labaka, Gorka; Alegría, Iñaki; Sarasola, Kepa. (2016). "<a href="https://wikipediaquality.com/wiki/Domain_Adaptation_in_MT_Using_Wikipedia_as_a_Parallel_Corpus:_Resources_and_Evaluation">Domain Adaptation in MT Using Wikipedia as a Parallel Corpus: Resources and Evaluation</a>". Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Volume 9982 LNCS, 2016, pp. 177-185. ISBN: 978-295174089-1.