Difference between revisions of "Mining for Domain-Specific Parallel Text from Wikipedia"

Revision as of 07:45, 25 June 2020

Mining for Domain-Specific Parallel Text from Wikipedia
Authors	Magdalena Plamada Martin Volk
Publication date	2013
Links	Original Preprint

Mining for Domain-Specific Parallel Text from Wikipedia - scientific work related to Wikipedia quality published in 2013, written by Magdalena Plamada and Martin Volk.

Overview

Previous attempts in extracting parallel data from Wikipedia were restricted by the monotonicity constraint of the alignment algorithm used for matching possible candidates. This paper proposes a method for exploiting Wikipedia articles without worrying about the position of the sentences in the text. The algorithm ranks the candidate sentence pairs by means of a customized metric, which combines different similarity criteria. Moreover, authors limit the search space to a specific topical domain, since final goal is to use the extracted data in a domain-specific Statistical Machine Translation (SMT) setting. The precision estimates show that the extracted sentence pairs are clearly semantically equivalent. The SMT experiments, however, show that the extracted data is not refined enough to improve a strong in-domain SMT system. Nevertheless, it is good enough to boost the performance of an out-of-domain system trained on sizable amounts of data.

Embed

Wikipedia Quality

Plamada, Magdalena; Volk, Martin. (2013). "[[Mining for Domain-Specific Parallel Text from Wikipedia]]".

English Wikipedia

{{cite journal |last1=Plamada |first1=Magdalena |last2=Volk |first2=Martin |title=Mining for Domain-Specific Parallel Text from Wikipedia |date=2013 |url=https://wikipediaquality.com/wiki/Mining_for_Domain-Specific_Parallel_Text_from_Wikipedia}}

HTML

Plamada, Magdalena; Volk, Martin. (2013). "<a href="https://wikipediaquality.com/wiki/Mining_for_Domain-Specific_Parallel_Text_from_Wikipedia">Mining for Domain-Specific Parallel Text from Wikipedia</a>".

@@ Line 10: / Line 10: @@
 == Overview ==
 Previous attempts in extracting parallel data from [[Wikipedia]] were restricted by the monotonicity constraint of the alignment algorithm used for matching possible candidates. This paper proposes a method for exploiting Wikipedia articles without worrying about the position of the sentences in the text. The algorithm ranks the candidate sentence pairs by means of a customized metric, which combines different similarity criteria. Moreover, authors limit the search space to a specific topical domain, since final goal is to use the extracted data in a domain-specific Statistical Machine Translation (SMT) setting. The precision estimates show that the extracted sentence pairs are clearly semantically equivalent. The SMT experiments, however, show that the extracted data is not refined enough to improve a strong in-domain SMT system. Nevertheless, it is good enough to boost the performance of an out-of-domain system trained on sizable amounts of data.
+== Embed ==
+=== Wikipedia Quality ===
+<code>
+<nowiki>
+Plamada, Magdalena; Volk, Martin. (2013). "[[Mining for Domain-Specific Parallel Text from Wikipedia]]".
+</nowiki>
+</code>
+=== English Wikipedia ===
+<code>
+<nowiki>
+{{cite journal |last1=Plamada |first1=Magdalena |last2=Volk |first2=Martin |title=Mining for Domain-Specific Parallel Text from Wikipedia |date=2013 |url=https://wikipediaquality.com/wiki/Mining_for_Domain-Specific_Parallel_Text_from_Wikipedia}}
+</nowiki>
+</code>
+=== HTML ===
+<code>
+<nowiki>
+Plamada, Magdalena; Volk, Martin. (2013). &amp;quot;<a href="https://wikipediaquality.com/wiki/Mining_for_Domain-Specific_Parallel_Text_from_Wikipedia">Mining for Domain-Specific Parallel Text from Wikipedia</a>&amp;quot;.
+</nowiki>
+</code>

Difference between revisions of "Mining for Domain-Specific Parallel Text from Wikipedia"

Revision as of 07:45, 25 June 2020

Contents

Overview

Embed

Wikipedia Quality

English Wikipedia

HTML

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools