Difference between revisions of "Wikipedia-Based Cross-Language Text Classification"

From Wikipedia Quality
Jump to: navigation, search
(+ Embed)
(+ category)
 
Line 32: Line 32:
 
</nowiki>
 
</nowiki>
 
</code>
 
</code>
 +
 +
 +
 +
[[Category:Scientific works]]

Latest revision as of 14:58, 27 June 2020


Wikipedia-Based Cross-Language Text Classification
Authors
Marcos Antonio Mourio Garca
Roberto Prez Rodrguez
Luis Anido Rifn
Publication date
2017
DOI
10.1016/j.ins.2017.04.024
Links
Original

Wikipedia-Based Cross-Language Text Classification - scientific work related to Wikipedia quality published in 2017, written by Marcos Antonio Mourio Garca, Roberto Prez Rodrguez and Luis Anido Rifn.

Overview

This paper presents the application of a Wikipedia-based bag of concepts (WikiBoC) document representation to cross-language text classification (CLTC). Its main objective is to alleviate the major drawbacks of the state-of-the-art CLTC approaches typically based on the machine translation (MT) of documents, which are represented as bags of words (BoW). Authors propose a technique called cross-language concept matching (CLCM), to convert concept-based representations of documents from one language to another using Wikipedia correspondences between concepts in different languages and thus not relying on automated full-text translations. Authors describe two proposals: the first proposal consists in the use of the WikiBoC representation in conjunction with the CLCM technique (WikiBoC-CLCM) to classify documents written in a language L1 by using a SVM algorithm that was trained with documents written in another language L2; the second proposal consists of a hybrid model for representing documents that combines WikiBoC-CLCM with the classic BoW-MT approach. To evaluate the two proposals authors conducted several experiments with three cross-lingual corpora: the JRC-Acquis corpus and two purpose-built corpora composed of Wikipedia articles. The first proposal outperforms state-of-the-art approaches when training sequences are short, achieving performance increases up to 233.33%. The second proposal outperforms state-of-the-art approaches in the whole range of training sequences, achieving performance increases up to 23.78%. Results obtained show the benefits of the WikiBoC-CLCM approach, since concepts extracted from documents add useful information to the classifier, thus improving its performance.

Embed

Wikipedia Quality

Garca, Marcos Antonio Mourio; Rodrguez, Roberto Prez; Rifn, Luis Anido. (2017). "[[Wikipedia-Based Cross-Language Text Classification]]". Elsevier. DOI: 10.1016/j.ins.2017.04.024.

English Wikipedia

{{cite journal |last1=Garca |first1=Marcos Antonio Mourio |last2=Rodrguez |first2=Roberto Prez |last3=Rifn |first3=Luis Anido |title=Wikipedia-Based Cross-Language Text Classification |date=2017 |doi=10.1016/j.ins.2017.04.024 |url=https://wikipediaquality.com/wiki/Wikipedia-Based_Cross-Language_Text_Classification |journal=Elsevier}}

HTML

Garca, Marcos Antonio Mourio; Rodrguez, Roberto Prez; Rifn, Luis Anido. (2017). &quot;<a href="https://wikipediaquality.com/wiki/Wikipedia-Based_Cross-Language_Text_Classification">Wikipedia-Based Cross-Language Text Classification</a>&quot;. Elsevier. DOI: 10.1016/j.ins.2017.04.024.