Difference between revisions of "Pairing Wikipedia Articles Across Languages"
(Adding infobox) |
(Embed) |
||
Line 9: | Line 9: | ||
== Overview == | == Overview == | ||
Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable [[features]] lies in its [[multilingual]] nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of [[language versions]] in [[Wikipedia]] has undergone a major renewal with the advent of [[Wikidata]], a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. Authors created a dataset of all the articles on persons authors extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. Authors report a correct match of at least 94.3% on each pair. (Less) | Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable [[features]] lies in its [[multilingual]] nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of [[language versions]] in [[Wikipedia]] has undergone a major renewal with the advent of [[Wikidata]], a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. Authors created a dataset of all the articles on persons authors extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. Authors report a correct match of at least 94.3% on each pair. (Less) | ||
+ | |||
+ | == Embed == | ||
+ | === Wikipedia Quality === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Klang, Marcus; Nugues, Pierre. (2016). "[[Pairing Wikipedia Articles Across Languages]]". The COLING 2016 Organizing Committee. | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === English Wikipedia === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | {{cite journal |last1=Klang |first1=Marcus |last2=Nugues |first2=Pierre |title=Pairing Wikipedia Articles Across Languages |date=2016 |url=https://wikipediaquality.com/wiki/Pairing_Wikipedia_Articles_Across_Languages |journal=The COLING 2016 Organizing Committee}} | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === HTML === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Klang, Marcus; Nugues, Pierre. (2016). &quot;<a href="https://wikipediaquality.com/wiki/Pairing_Wikipedia_Articles_Across_Languages">Pairing Wikipedia Articles Across Languages</a>&quot;. The COLING 2016 Organizing Committee. | ||
+ | </nowiki> | ||
+ | </code> |
Revision as of 00:48, 14 February 2021
Authors | Marcus Klang Pierre Nugues |
---|---|
Publication date | 2016 |
Links | Original |
Pairing Wikipedia Articles Across Languages - scientific work related to Wikipedia quality published in 2016, written by Marcus Klang and Pierre Nugues.
Overview
Wikipedia has become a reference knowledge source for scores of NLP applications. One of its invaluable features lies in its multilingual nature, where articles on a same entity or concept can have from one to more than 200 different versions. The interlinking of language versions in Wikipedia has undergone a major renewal with the advent of Wikidata, a unified scheme to identify entities and their properties using unique numbers. However, as the interlinking is still manuallycarriedoutbythousandsofeditorsacrosstheglobe,errorsmaycreepintheassignment ofentities. Inthispaper,wedescribeanoptimizationtechniquetomatchautomaticallylanguage versions of articles, and hence entities, that is only based on bags of words and anchors. Authors created a dataset of all the articles on persons authors extracted from Wikipedia in six languages: English, French, German, Russian, Spanish, and Swedish. Authors report a correct match of at least 94.3% on each pair. (Less)
Embed
Wikipedia Quality
Klang, Marcus; Nugues, Pierre. (2016). "[[Pairing Wikipedia Articles Across Languages]]". The COLING 2016 Organizing Committee.
English Wikipedia
{{cite journal |last1=Klang |first1=Marcus |last2=Nugues |first2=Pierre |title=Pairing Wikipedia Articles Across Languages |date=2016 |url=https://wikipediaquality.com/wiki/Pairing_Wikipedia_Articles_Across_Languages |journal=The COLING 2016 Organizing Committee}}
HTML
Klang, Marcus; Nugues, Pierre. (2016). "<a href="https://wikipediaquality.com/wiki/Pairing_Wikipedia_Articles_Across_Languages">Pairing Wikipedia Articles Across Languages</a>". The COLING 2016 Organizing Committee.