Towards a Wikipedia-Extracted Alpine Corpus

Towards a Wikipedia-Extracted Alpine Corpus
Authors	Magdalena Plamada Martin Volk
Publication date	2012
DOI	10.5167/uzh-63885
Links	Original

Towards a Wikipedia-Extracted Alpine Corpus - scientific work related to Wikipedia quality published in 2012, written by Magdalena Plamada and Martin Volk.

Overview

This paper describes a method for extracting parallel sentences from comparable texts. Authors present the main challenges in creating a German-French corpus for the Alpine domain. Authors demonstrate that it is difficult to use the Wikipedia categorization for the extraction of domain-specific articles from Wikipedia, therefore authors introduce an alternative information retrieval approach. Sentence alignment algorithms were used to identify semantically equivalent sentences across the Wikipedia articles. Using this approach, authors create a corpus of sentence-aligned Alpine texts, which is evaluated both manually and automatically. Results show that even a small collection of extracted texts (approximately 10000 sentence pairs) can partially improve the performance of a state-of-the-art statistical machine translation system. Thus, the approach is worth pursuing on a larger scale, as well as for other language pairs and domains.

Embed

Wikipedia Quality

Plamada, Magdalena; Volk, Martin. (2012). "[[Towards a Wikipedia-Extracted Alpine Corpus]]".DOI: 10.5167/uzh-63885.

English Wikipedia

{{cite journal |last1=Plamada |first1=Magdalena |last2=Volk |first2=Martin |title=Towards a Wikipedia-Extracted Alpine Corpus |date=2012 |doi=10.5167/uzh-63885 |url=https://wikipediaquality.com/wiki/Towards_a_Wikipedia-Extracted_Alpine_Corpus}}

HTML

Plamada, Magdalena; Volk, Martin. (2012). "<a href="https://wikipediaquality.com/wiki/Towards_a_Wikipedia-Extracted_Alpine_Corpus">Towards a Wikipedia-Extracted Alpine Corpus</a>".DOI: 10.5167/uzh-63885.

Towards a Wikipedia-Extracted Alpine Corpus

Contents

Overview

Embed

Wikipedia Quality

English Wikipedia

HTML

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools