Difference between revisions of "Information Arbitrage Across Multi-Lingual Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(Information Arbitrage Across Multi-Lingual Wikipedia - basic info)
 
(+ links)
Line 1: Line 1:
'''Information Arbitrage Across Multi-Lingual Wikipedia''' - scientific work related to Wikipedia quality published in 2009, written by Eytan Adar, Michael Skinner and Daniel S. Weld.
+
'''Information Arbitrage Across Multi-Lingual Wikipedia''' - scientific work related to [[Wikipedia quality]] published in 2009, written by [[Eytan Adar]], [[Michael Skinner]] and [[Daniel S. Weld]].
  
 
== Overview ==
 
== Overview ==
The rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage --leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), authors present Ziggurat , an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Authors method uses self-supervised learning and experiments demonstrate the method's feasibility, even in the absence of dictionaries.
+
The rapid globalization of [[Wikipedia]] is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many [[different language]]s emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage --leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), authors present Ziggurat , an automated system for aligning Wikipedia [[infoboxes]], creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Authors method uses self-supervised learning and experiments demonstrate the method's feasibility, even in the absence of dictionaries.

Revision as of 09:20, 8 October 2019

Information Arbitrage Across Multi-Lingual Wikipedia - scientific work related to Wikipedia quality published in 2009, written by Eytan Adar, Michael Skinner and Daniel S. Weld.

Overview

The rapid globalization of Wikipedia is generating a parallel, multi-lingual corpus of unprecedented scale. Pages for the same topic in many different languages emerge both as a result of manual translation and independent development. Unfortunately, these pages may appear at different times, vary in size, scope, and quality. Furthermore, differential growth rates cause the conceptual mapping between articles in different languages to be both complex and dynamic. These disparities provide the opportunity for a powerful form of information arbitrage --leveraging articles in one or more languages to improve the content in another. Analyzing four large language domains (English, Spanish, French, and German), authors present Ziggurat , an automated system for aligning Wikipedia infoboxes, creating new infoboxes as necessary, filling in missing information, and detecting discrepancies between parallel pages. Authors method uses self-supervised learning and experiments demonstrate the method's feasibility, even in the absence of dictionaries.