Difference between revisions of "Multilingual Schema Matching for Wikipedia Infoboxes"

From Wikipedia Quality
Jump to: navigation, search
(Int.links)
(Infobox work)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Multilingual Schema Matching for Wikipedia Infoboxes
 +
| date = 2011
 +
| authors = [[Thanh Hoang Nguyen]]<br />[[Viviane Pereira Moreira]]<br />[[Huong Nguyen]]<br />[[Hoa Nguyen]]<br />[[Juliana Freire]]
 +
| doi = 10.14778/2078324.2078329
 +
| link = http://dl.acm.org/citation.cfm?doid=2078324.2078329
 +
| plink = https://arxiv.org/pdf/1110.6651
 +
}}
 
'''Multilingual Schema Matching for Wikipedia Infoboxes''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Thanh Hoang Nguyen]], [[Viviane Pereira Moreira]], [[Huong Nguyen]], [[Hoa Nguyen]] and [[Juliana Freire]].
 
'''Multilingual Schema Matching for Wikipedia Infoboxes''' - scientific work related to [[Wikipedia quality]] published in 2011, written by [[Thanh Hoang Nguyen]], [[Viviane Pereira Moreira]], [[Huong Nguyen]], [[Hoa Nguyen]] and [[Juliana Freire]].
  
 
== Overview ==
 
== Overview ==
 
Recent research has taken advantage of [[Wikipedia]]'s multi-lingualism as a resource for cross-language [[information retrieval]] and [[machine translation]], as well as proposed techniques for enriching its cross-language structure. The availability of documents in [[multiple languages]] also opens up new opportunities for querying structured Wikipedia content, and in particular, to enable answers that straddle [[different language]]s. As a step towards supporting such queries, in this paper, authors propose a method for identifying mappings between attributes from [[infoboxes]] that come from pages in different languages. Authors approach finds mappings in a completely automated fashion. Because it does not require training data, it is scalable: not only can it be used to find mappings between many language pairs, but it is also effective for languages that are under-represented and lack sufficient training samples. Another important benefit of approach is that it does not depend on syntactic similarity between attribute names, and thus, it can be applied to language pairs that have distinct morphologies. Authors have performed an extensive experimental evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and English. The results show that not only does approach obtain high precision and recall, but it also outperforms state-of-the-art techniques. Authors also present a case study which demonstrates that the [[multilingual]] mappings authors derive lead to substantial improvements in answer quality and coverage for structured queries over Wikipedia content.
 
Recent research has taken advantage of [[Wikipedia]]'s multi-lingualism as a resource for cross-language [[information retrieval]] and [[machine translation]], as well as proposed techniques for enriching its cross-language structure. The availability of documents in [[multiple languages]] also opens up new opportunities for querying structured Wikipedia content, and in particular, to enable answers that straddle [[different language]]s. As a step towards supporting such queries, in this paper, authors propose a method for identifying mappings between attributes from [[infoboxes]] that come from pages in different languages. Authors approach finds mappings in a completely automated fashion. Because it does not require training data, it is scalable: not only can it be used to find mappings between many language pairs, but it is also effective for languages that are under-represented and lack sufficient training samples. Another important benefit of approach is that it does not depend on syntactic similarity between attribute names, and thus, it can be applied to language pairs that have distinct morphologies. Authors have performed an extensive experimental evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and English. The results show that not only does approach obtain high precision and recall, but it also outperforms state-of-the-art techniques. Authors also present a case study which demonstrates that the [[multilingual]] mappings authors derive lead to substantial improvements in answer quality and coverage for structured queries over Wikipedia content.

Revision as of 09:57, 7 July 2020


Multilingual Schema Matching for Wikipedia Infoboxes
Authors
Thanh Hoang Nguyen
Viviane Pereira Moreira
Huong Nguyen
Hoa Nguyen
Juliana Freire
Publication date
2011
DOI
10.14778/2078324.2078329
Links
Original Preprint

Multilingual Schema Matching for Wikipedia Infoboxes - scientific work related to Wikipedia quality published in 2011, written by Thanh Hoang Nguyen, Viviane Pereira Moreira, Huong Nguyen, Hoa Nguyen and Juliana Freire.

Overview

Recent research has taken advantage of Wikipedia's multi-lingualism as a resource for cross-language information retrieval and machine translation, as well as proposed techniques for enriching its cross-language structure. The availability of documents in multiple languages also opens up new opportunities for querying structured Wikipedia content, and in particular, to enable answers that straddle different languages. As a step towards supporting such queries, in this paper, authors propose a method for identifying mappings between attributes from infoboxes that come from pages in different languages. Authors approach finds mappings in a completely automated fashion. Because it does not require training data, it is scalable: not only can it be used to find mappings between many language pairs, but it is also effective for languages that are under-represented and lack sufficient training samples. Another important benefit of approach is that it does not depend on syntactic similarity between attribute names, and thus, it can be applied to language pairs that have distinct morphologies. Authors have performed an extensive experimental evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and English. The results show that not only does approach obtain high precision and recall, but it also outperforms state-of-the-art techniques. Authors also present a case study which demonstrates that the multilingual mappings authors derive lead to substantial improvements in answer quality and coverage for structured queries over Wikipedia content.