Difference between revisions of "Extracting Comparable Articles from Wikipedia and Measuring Their Comparabilities"

From Wikipedia Quality
Jump to: navigation, search
(wikilinks)
(infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Extracting Comparable Articles from Wikipedia and Measuring Their Comparabilities
 +
| date = 2013
 +
| authors = [[Motaz Saad]]<br />[[David Langlois]]<br />[[Kamel Smaïli]]
 +
| doi = 10.1016/j.sbspro.2013.10.620
 +
| link = http://www.sciencedirect.com/science/article/pii/S1877042813041402
 +
}}
 
'''Extracting Comparable Articles from Wikipedia and Measuring Their Comparabilities''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Motaz Saad]], [[David Langlois]] and [[Kamel Smaïli]].
 
'''Extracting Comparable Articles from Wikipedia and Measuring Their Comparabilities''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Motaz Saad]], [[David Langlois]] and [[Kamel Smaïli]].
  
 
== Overview ==
 
== Overview ==
 
Parallel corpora are not available for all domains and languages, but statistical methods in [[multilingual]] research domains require huge parallel/comparable corpora. Comparable corpora can be used when the parallel is not sufficient or not available for specific domains and languages. In this paper, authors propose a method to extract all comparable articles from [[Wikipedia]] for [[multiple languages]] based on interlanguge links. Authors also extract comparable articles from Euro News website. Authors also present two comparability [[measures]] (CM) to compute the degree of comparability of multilingual articles. Authors extracted about 40K and 34K comparable articles from Wikipedia and Euro News respectively in three languages including Arabic, French, and English. Experimental results of comparability measures show that measure can capture the comparability of multilingual corpora and allow to retrieve articles from [[different language]] concerning the same topic.
 
Parallel corpora are not available for all domains and languages, but statistical methods in [[multilingual]] research domains require huge parallel/comparable corpora. Comparable corpora can be used when the parallel is not sufficient or not available for specific domains and languages. In this paper, authors propose a method to extract all comparable articles from [[Wikipedia]] for [[multiple languages]] based on interlanguge links. Authors also extract comparable articles from Euro News website. Authors also present two comparability [[measures]] (CM) to compute the degree of comparability of multilingual articles. Authors extracted about 40K and 34K comparable articles from Wikipedia and Euro News respectively in three languages including Arabic, French, and English. Experimental results of comparability measures show that measure can capture the comparability of multilingual corpora and allow to retrieve articles from [[different language]] concerning the same topic.

Revision as of 09:14, 8 October 2019


Extracting Comparable Articles from Wikipedia and Measuring Their Comparabilities
Authors
Motaz Saad
David Langlois
Kamel Smaïli
Publication date
2013
DOI
10.1016/j.sbspro.2013.10.620
Links
Original

Extracting Comparable Articles from Wikipedia and Measuring Their Comparabilities - scientific work related to Wikipedia quality published in 2013, written by Motaz Saad, David Langlois and Kamel Smaïli.

Overview

Parallel corpora are not available for all domains and languages, but statistical methods in multilingual research domains require huge parallel/comparable corpora. Comparable corpora can be used when the parallel is not sufficient or not available for specific domains and languages. In this paper, authors propose a method to extract all comparable articles from Wikipedia for multiple languages based on interlanguge links. Authors also extract comparable articles from Euro News website. Authors also present two comparability measures (CM) to compute the degree of comparability of multilingual articles. Authors extracted about 40K and 34K comparable articles from Wikipedia and Euro News respectively in three languages including Arabic, French, and English. Experimental results of comparability measures show that measure can capture the comparability of multilingual corpora and allow to retrieve articles from different language concerning the same topic.