A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge

From Wikipedia Quality
Jump to: navigation, search


A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge
Authors
Marcos Mouriño-García
Roberto Pérez-Rodríguez
Luis E. Anido-Rifón
Publication date
2017
DOI
10.3414/ME17-01-0028
Links
Original

A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge - scientific work related to Wikipedia quality published in 2017, written by Marcos Mouriño-García, Roberto Pérez-Rodríguez and Luis E. Anido-Rifón.

Overview

Objectives: The ability to efficiently review the existing literature is essential for the rapid progress of research. This paper describes a classifier of text documents, represented as vectors in spaces of Wikipedia concepts, and analyses its suitability for classification of Spanish biomedical documents when only English documents are available for training. Authors propose the cross-language concept matching (CLCM) technique, which relies on Wikipedia interlanguage links to convert concept vectors from the Spanish to the English space. Methods: The performance of the classifier is compared to several baselines: a classifier based on machine translation, a classifier that represents documents after performing Explicit Semantic Analysis (ESA), and a classifier that uses a domain-specific semantic an- notator (MetaMap). The corpus used for the experiments (Cross-Language UVigoMED) was purpose-built for this study, and it is composed of 12,832 English and 2,184 Spanish MEDLINE abstracts. Results: The performance of approach is superior to any other state-of-the art classifier in the benchmark, with performance increases up to: 124% over classical machine translation, 332% over MetaMap, and 60 times over the classifier based on ESA. The results have statistical significance, showing p-values Conclusion: Using knowledge mined from Wikipedia to represent documents as vectors in a space of Wikipedia concepts and translating vectors between language-specific concept spaces, a cross-language classifier can be built, and it performs better than several state-of-the-art classifiers.

Embed

Wikipedia Quality

Mouriño-García, Marcos; Pérez-Rodríguez, Roberto; Anido-Rifón, Luis E.. (2017). "[[A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge]]". Schattauer GmbH. DOI: 10.3414/ME17-01-0028.

English Wikipedia

{{cite journal |last1=Mouriño-García |first1=Marcos |last2=Pérez-Rodríguez |first2=Roberto |last3=Anido-Rifón |first3=Luis E. |title=A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge |date=2017 |doi=10.3414/ME17-01-0028 |url=https://wikipediaquality.com/wiki/A_Bag_of_Concepts_Approach_for_Biomedical_Document_Classification_Using_Wikipedia_Knowledge |journal=Schattauer GmbH}}

HTML

Mouriño-García, Marcos; Pérez-Rodríguez, Roberto; Anido-Rifón, Luis E.. (2017). &quot;<a href="https://wikipediaquality.com/wiki/A_Bag_of_Concepts_Approach_for_Biomedical_Document_Classification_Using_Wikipedia_Knowledge">A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge</a>&quot;. Schattauer GmbH. DOI: 10.3414/ME17-01-0028.