A Factory of Comparable Corpora from Wikipedia

From Wikipedia Quality
Revision as of 00:38, 20 May 2019 by Expert (talk | contribs) (New work - A Factory of Comparable Corpora from Wikipedia)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


A Factory of Comparable Corpora from Wikipedia
Authors
Alberto Barrón-Cedeño
Cristina España-Bonet
Josu Boldoba
Lluís Màrquez
Publication date
2015
DOI
10.18653/v1/W15-3402
Links
Original

A Factory of Comparable Corpora from Wikipedia - scientific work related to Wikipedia quality published in 2015, written by Alberto Barrón-Cedeño, Cristina España-Bonet, Josu Boldoba and Lluís Màrquez.

Overview

Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. Authors present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia. In order to prove the value of the model, authors automatically extract parallel sentences from the comparable collections and use them to train statistical machine translation engines for specific domains. Authors experiments on the English‐ Spanish pair in the domains of Computer Science, Science, and Sports show that in-domain translator performs significantly better than a generic one when translating in-domain Wikipedia articles. Moreover, authors show that these corpora can help when translating out-of-domain texts.

Embed

Wikipedia Quality

Barrón-Cedeño, Alberto; España-Bonet, Cristina; Boldoba, Josu; Màrquez, Lluís. (2015). "[[A Factory of Comparable Corpora from Wikipedia]]". Association for Computational Linguistics. DOI: 10.18653/v1/W15-3402.

English Wikipedia

{{cite journal |last1=Barrón-Cedeño |first1=Alberto |last2=España-Bonet |first2=Cristina |last3=Boldoba |first3=Josu |last4=Màrquez |first4=Lluís |title=A Factory of Comparable Corpora from Wikipedia |date=2015 |doi=10.18653/v1/W15-3402 |url=https://wikipediaquality.com/wiki/A_Factory_of_Comparable_Corpora_from_Wikipedia |journal=Association for Computational Linguistics}}

HTML

Barrón-Cedeño, Alberto; España-Bonet, Cristina; Boldoba, Josu; Màrquez, Lluís. (2015). &quot;<a href="https://wikipediaquality.com/wiki/A_Factory_of_Comparable_Corpora_from_Wikipedia">A Factory of Comparable Corpora from Wikipedia</a>&quot;. Association for Computational Linguistics. DOI: 10.18653/v1/W15-3402.