Difference between revisions of "Leveraging Wikipedia Knowledge to Cross-Language Classify Textual News"

From Wikipedia Quality
Jump to: navigation, search
(+ wikilinks)
(Infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Leveraging Wikipedia Knowledge to Cross-Language Classify Textual News
 +
| date = 2017
 +
| authors = [[Marcos Mouriño-García]]<br />[[Roberto Pérez-Rodríguez]]<br />[[Luis E. Anido-Rifón]]
 +
| doi = 10.1109/iscmi.2017.8279619
 +
| link = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=8279619
 +
}}
 
'''Leveraging Wikipedia Knowledge to Cross-Language Classify Textual News''' - scientific work related to [[Wikipedia quality]] published in 2017, written by [[Marcos Mouriño-García]], [[Roberto Pérez-Rodríguez]] and [[Luis E. Anido-Rifón]].
 
'''Leveraging Wikipedia Knowledge to Cross-Language Classify Textual News''' - scientific work related to [[Wikipedia quality]] published in 2017, written by [[Marcos Mouriño-García]], [[Roberto Pérez-Rodríguez]] and [[Luis E. Anido-Rifón]].
  
 
== Overview ==
 
== Overview ==
 
This paper presents a first attempt of leveraging [[Wikipedia]] knowledge to represent textual news stories as vectors of Wikipedia concepts, and analysis its suitability for creating a cross-language classifier of textual news stories written in Spanish when it is trained only with English ones. Authors describe two approaches. The first one is based only on Wikipedia concepts to represent the news stories (WikiBoC-CLCM). The second approach (Hybrid-WikiBoC) combines the WikiBoC-CLCM classifier with the state-of-the-art approach based on the bag of words model along with [[machine translation]] techniques (BoW-MT). To evaluate the approaches proposed authors present a dataset composed of news written in English and Spanish, extracted from several online newspapers and news agencies such as Reuters and Europa Press. The results obtained show that the purely based on concepts WikiBoC-CLCM approach offers the highest classification performance, achieving increases up to 55.07% over the state-of-the-art BoW-MT approach. The Hybrid-WikiBoC approach also outperforms the BoW-MT model, achieving performance increases up to 2.34% Authors conclude that leveraging Wikipedia knowledge is of great advantage in tasks of cross-language classification of textual news stories.
 
This paper presents a first attempt of leveraging [[Wikipedia]] knowledge to represent textual news stories as vectors of Wikipedia concepts, and analysis its suitability for creating a cross-language classifier of textual news stories written in Spanish when it is trained only with English ones. Authors describe two approaches. The first one is based only on Wikipedia concepts to represent the news stories (WikiBoC-CLCM). The second approach (Hybrid-WikiBoC) combines the WikiBoC-CLCM classifier with the state-of-the-art approach based on the bag of words model along with [[machine translation]] techniques (BoW-MT). To evaluate the approaches proposed authors present a dataset composed of news written in English and Spanish, extracted from several online newspapers and news agencies such as Reuters and Europa Press. The results obtained show that the purely based on concepts WikiBoC-CLCM approach offers the highest classification performance, achieving increases up to 55.07% over the state-of-the-art BoW-MT approach. The Hybrid-WikiBoC approach also outperforms the BoW-MT model, achieving performance increases up to 2.34% Authors conclude that leveraging Wikipedia knowledge is of great advantage in tasks of cross-language classification of textual news stories.

Revision as of 18:35, 11 May 2020


Leveraging Wikipedia Knowledge to Cross-Language Classify Textual News
Authors
Marcos Mouriño-García
Roberto Pérez-Rodríguez
Luis E. Anido-Rifón
Publication date
2017
DOI
10.1109/iscmi.2017.8279619
Links
Original

Leveraging Wikipedia Knowledge to Cross-Language Classify Textual News - scientific work related to Wikipedia quality published in 2017, written by Marcos Mouriño-García, Roberto Pérez-Rodríguez and Luis E. Anido-Rifón.

Overview

This paper presents a first attempt of leveraging Wikipedia knowledge to represent textual news stories as vectors of Wikipedia concepts, and analysis its suitability for creating a cross-language classifier of textual news stories written in Spanish when it is trained only with English ones. Authors describe two approaches. The first one is based only on Wikipedia concepts to represent the news stories (WikiBoC-CLCM). The second approach (Hybrid-WikiBoC) combines the WikiBoC-CLCM classifier with the state-of-the-art approach based on the bag of words model along with machine translation techniques (BoW-MT). To evaluate the approaches proposed authors present a dataset composed of news written in English and Spanish, extracted from several online newspapers and news agencies such as Reuters and Europa Press. The results obtained show that the purely based on concepts WikiBoC-CLCM approach offers the highest classification performance, achieving increases up to 55.07% over the state-of-the-art BoW-MT approach. The Hybrid-WikiBoC approach also outperforms the BoW-MT model, achieving performance increases up to 2.34% Authors conclude that leveraging Wikipedia knowledge is of great advantage in tasks of cross-language classification of textual news stories.