Difference between revisions of "A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge"

From Wikipedia Quality
Jump to: navigation, search
(Adding categories)
 
Line 9: Line 9:
  
 
== Overview ==
 
== Overview ==
Objectives: The ability to efficiently review the existing literature is essential for the rapid progress of research. This paper describes a classifier of text documents, represented as vectors in spaces of [[Wikipedia]] concepts, and analyses its suitability for classification of Spanish biomedical documents when only English documents are available for training. Authors propose the cross-language concept matching (CLCM) technique, which relies on Wikipedia interlanguage links to convert concept vectors from the Spanish to the English space. Methods: The performance of the classifier is compared to several baselines: a classifier based on [[machine translation]], a classifier that represents documents after performing Explicit Semantic Analysis (ESA), and a classifier that uses a domain-specific semantic an- notator (MetaMap). The corpus used for the experiments (Cross-Language UVigoMED) was purpose-built for this study, and it is composed of 12,832 English and 2,184 Spanish MEDLINE abstracts. Results: The performance of approach is superior to any other state-of-the art classifier in the benchmark, with performance increases up to: 124% over classical machine translation, 332% over MetaMap, and 60 times over the classifier based on ESA. The results have statistical significance, showing p-values Conclusion: Using knowledge mined from Wikipedia to represent documents as vectors in a space of Wikipedia concepts and translating vectors between language-specific concept spaces, a cross-language classifier can be built, and it performs better than several state-of-the-art classifiers.
+
Objective: Efficiently reviewing existing literature is crucial for research advancement. This study presents a text document classifier that uses vectors in Wikipedia concept spaces for classifying Spanish biomedical documents, even when only English documents are available for training. The proposed technique is called cross-language concept matching (CLCM) and utilizes Wikipedia interlanguage links to convert concept vectors from Spanish to English.
 +
 
 +
Methods: The classifier's performance is compared with multiple baselines, including a machine translation-based classifier, a classifier using Explicit Semantic Analysis (ESA), and a domain-specific semantic annotator (MetaMap). The experimental corpus (Cross-Language UVigoMED) consists of 12,832 English and 2,184 Spanish MEDLINE abstracts, specifically created for this study.
 +
 
 +
Results: The proposed approach outperforms all other state-of-the-art classifiers in the benchmark, with significant performance increases of 124% over traditional machine translation, 332% over MetaMap, and 60 times over the ESA-based classifier. The results show statistical significance with p-values.
 +
 
 +
Conclusion: By leveraging knowledge extracted from Wikipedia to represent documents as vectors in Wikipedia concept spaces and translating vectors between language-specific concept spaces, an effective cross-language classifier can be developed. This classifier performs better than several existing state-of-the-art classifiers.
  
 
== Embed ==
 
== Embed ==

Latest revision as of 21:29, 25 April 2023


A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge
Authors
Marcos Mouriño-García
Roberto Pérez-Rodríguez
Luis E. Anido-Rifón
Publication date
2017
DOI
10.3414/ME17-01-0028
Links
Original

A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge - scientific work related to Wikipedia quality published in 2017, written by Marcos Mouriño-García, Roberto Pérez-Rodríguez and Luis E. Anido-Rifón.

Overview

Objective: Efficiently reviewing existing literature is crucial for research advancement. This study presents a text document classifier that uses vectors in Wikipedia concept spaces for classifying Spanish biomedical documents, even when only English documents are available for training. The proposed technique is called cross-language concept matching (CLCM) and utilizes Wikipedia interlanguage links to convert concept vectors from Spanish to English.

Methods: The classifier's performance is compared with multiple baselines, including a machine translation-based classifier, a classifier using Explicit Semantic Analysis (ESA), and a domain-specific semantic annotator (MetaMap). The experimental corpus (Cross-Language UVigoMED) consists of 12,832 English and 2,184 Spanish MEDLINE abstracts, specifically created for this study.

Results: The proposed approach outperforms all other state-of-the-art classifiers in the benchmark, with significant performance increases of 124% over traditional machine translation, 332% over MetaMap, and 60 times over the ESA-based classifier. The results show statistical significance with p-values.

Conclusion: By leveraging knowledge extracted from Wikipedia to represent documents as vectors in Wikipedia concept spaces and translating vectors between language-specific concept spaces, an effective cross-language classifier can be developed. This classifier performs better than several existing state-of-the-art classifiers.

Embed

Wikipedia Quality

Mouriño-García, Marcos; Pérez-Rodríguez, Roberto; Anido-Rifón, Luis E.. (2017). "[[A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge]]". Schattauer GmbH. DOI: 10.3414/ME17-01-0028.

English Wikipedia

{{cite journal |last1=Mouriño-García |first1=Marcos |last2=Pérez-Rodríguez |first2=Roberto |last3=Anido-Rifón |first3=Luis E. |title=A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge |date=2017 |doi=10.3414/ME17-01-0028 |url=https://wikipediaquality.com/wiki/A_Bag_of_Concepts_Approach_for_Biomedical_Document_Classification_Using_Wikipedia_Knowledge |journal=Schattauer GmbH}}

HTML

Mouriño-García, Marcos; Pérez-Rodríguez, Roberto; Anido-Rifón, Luis E.. (2017). &quot;<a href="https://wikipediaquality.com/wiki/A_Bag_of_Concepts_Approach_for_Biomedical_Document_Classification_Using_Wikipedia_Knowledge">A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge</a>&quot;. Schattauer GmbH. DOI: 10.3414/ME17-01-0028.