A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge

From Wikipedia Quality
Jump to: navigation, search


A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge
Authors
Marcos Mouriño-García
Roberto Pérez-Rodríguez
Luis E. Anido-Rifón
Publication date
2017
DOI
10.3414/ME17-01-0028
Links
Original

A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge - scientific work related to Wikipedia quality published in 2017, written by Marcos Mouriño-García, Roberto Pérez-Rodríguez and Luis E. Anido-Rifón.

Overview

Objective: Efficiently reviewing existing literature is crucial for research advancement. This study presents a text document classifier that uses vectors in Wikipedia concept spaces for classifying Spanish biomedical documents, even when only English documents are available for training. The proposed technique is called cross-language concept matching (CLCM) and utilizes Wikipedia interlanguage links to convert concept vectors from Spanish to English.

Methods: The classifier's performance is compared with multiple baselines, including a machine translation-based classifier, a classifier using Explicit Semantic Analysis (ESA), and a domain-specific semantic annotator (MetaMap). The experimental corpus (Cross-Language UVigoMED) consists of 12,832 English and 2,184 Spanish MEDLINE abstracts, specifically created for this study.

Results: The proposed approach outperforms all other state-of-the-art classifiers in the benchmark, with significant performance increases of 124% over traditional machine translation, 332% over MetaMap, and 60 times over the ESA-based classifier. The results show statistical significance with p-values.

Conclusion: By leveraging knowledge extracted from Wikipedia to represent documents as vectors in Wikipedia concept spaces and translating vectors between language-specific concept spaces, an effective cross-language classifier can be developed. This classifier performs better than several existing state-of-the-art classifiers.

Embed

Wikipedia Quality

Mouriño-García, Marcos; Pérez-Rodríguez, Roberto; Anido-Rifón, Luis E.. (2017). "[[A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge]]". Schattauer GmbH. DOI: 10.3414/ME17-01-0028.

English Wikipedia

{{cite journal |last1=Mouriño-García |first1=Marcos |last2=Pérez-Rodríguez |first2=Roberto |last3=Anido-Rifón |first3=Luis E. |title=A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge |date=2017 |doi=10.3414/ME17-01-0028 |url=https://wikipediaquality.com/wiki/A_Bag_of_Concepts_Approach_for_Biomedical_Document_Classification_Using_Wikipedia_Knowledge |journal=Schattauer GmbH}}

HTML

Mouriño-García, Marcos; Pérez-Rodríguez, Roberto; Anido-Rifón, Luis E.. (2017). &quot;<a href="https://wikipediaquality.com/wiki/A_Bag_of_Concepts_Approach_for_Biomedical_Document_Classification_Using_Wikipedia_Knowledge">A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge</a>&quot;. Schattauer GmbH. DOI: 10.3414/ME17-01-0028.