Exploiting Wikipedia for Cross-Lingual and Multilingual Information Retrieval

From Wikipedia Quality
Revision as of 14:06, 4 January 2020 by Amelia (talk | contribs) (+ category)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Exploiting Wikipedia for Cross-Lingual and Multilingual Information Retrieval
Authors
Philipp Sorg
Philipp Cimiano
Publication date
2012
DOI
10.1016/j.datak.2012.02.003
Links
Original

Exploiting Wikipedia for Cross-Lingual and Multilingual Information Retrieval - scientific work related to Wikipedia quality published in 2012, written by Philipp Sorg and Philipp Cimiano.

Overview

In this article authors show how Wikipedia as a multilingual knowledge resource can be exploited for Cross-Language and Multilingual Information Retrieval (CLIR/MLIR). Authors describe an approach authors call Cross-Language Explicit Semantic Analysis (CL-ESA) which indexes documents with respect to explicit interlingual concepts. These concepts are considered as interlingual and universal and in case correspond either to Wikipedia articles or categories. Each concept is associated to a text signature in each language which can be used to estimate language-specific term distributions for each concept. This knowledge can then be used to calculate the strength of association between a term and a concept which is used to map documents into the concept space. With CL-ESA authors are thus moving from a Bag-Of-Words model to a Bag-Of-Concepts model that allows language-independent document representations in the vector space spanned by interlingual and universal concepts. Authors show how different vector-based retrieval models and term weighting strategies can be used in conjunction with CL-ESA and experimentally analyze the performance of the different choices. Authors evaluate the approach on a mate retrieval task on two datasets: JRC-Acquis and Multext. Authors show that in the MLIR settings, CL-ESA benefits from a certain level of abstraction in the sense that using categories instead of articles as in the original ESA model delivers better results.

Embed

Wikipedia Quality

Sorg, Philipp; Cimiano, Philipp. (2012). "[[Exploiting Wikipedia for Cross-Lingual and Multilingual Information Retrieval]]". Elsevier Science Publishers B. V.. DOI: 10.1016/j.datak.2012.02.003.

English Wikipedia

{{cite journal |last1=Sorg |first1=Philipp |last2=Cimiano |first2=Philipp |title=Exploiting Wikipedia for Cross-Lingual and Multilingual Information Retrieval |date=2012 |doi=10.1016/j.datak.2012.02.003 |url=https://wikipediaquality.com/wiki/Exploiting_Wikipedia_for_Cross-Lingual_and_Multilingual_Information_Retrieval |journal=Elsevier Science Publishers B. V.}}

HTML

Sorg, Philipp; Cimiano, Philipp. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Exploiting_Wikipedia_for_Cross-Lingual_and_Multilingual_Information_Retrieval">Exploiting Wikipedia for Cross-Lingual and Multilingual Information Retrieval</a>&quot;. Elsevier Science Publishers B. V.. DOI: 10.1016/j.datak.2012.02.003.