Biomedical Literature Classification Using Encyclopedic Knowledge: a Wikipedia-Based Bag-Of-Concepts Approach

From Wikipedia Quality
Revision as of 09:49, 13 March 2021 by Shiela (talk | contribs) (cat.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Biomedical Literature Classification Using Encyclopedic Knowledge: a Wikipedia-Based Bag-Of-Concepts Approach
Authors
Marcos Mouriño García
Roberto Pérez Rodríguez
Luis Anido Rifón
Publication date
2015
DOI
10.7717/peerj.1279
Links
Original

Biomedical Literature Classification Using Encyclopedic Knowledge: a Wikipedia-Based Bag-Of-Concepts Approach - scientific work related to Wikipedia quality published in 2015, written by Marcos Mouriño García, Roberto Pérez Rodríguez and Luis Anido Rifón.

Overview

Automatic classification of text documents into a set of categories has a lot of applications. Among those applications, the automatic classification of biomedical literature stands out as an important application for automatic document classification strategies. Biomedical staff and researchers have to deal with a lot of literature in their daily activities, so it would be useful a system that allows for accessing to documents of interest in a simple and effective way; thus, it is necessary that these documents are sorted based on some criteria—that is to say, they have to be classified. Documents to classify are usually represented following the bag-of-words (BoW) paradigm. Features are words in the text—thus suffering from synonymy and polysemy—and their weights are just based on their frequency of occurrence. This paper presents an empirical study of the efficiency of a classifier that leverages encyclopedic background knowledge—concretely Wikipedia—in order to create bag-of-concepts (BoC) representations of documents, understanding concept as “unit of meaning”, and thus tackling synonymy and polysemy. Besides, the weighting of concepts is based on their semantic relevance in the text. For the evaluation of the proposal, empirical experiments have been conducted with one of the commonly used corpora for evaluating classification and retrieval of biomedical information, OHSUMED, and also with a purpose-built corpus of MEDLINE biomedical abstracts, UVigoMED. Results obtained show that the Wikipedia-based bag-of-concepts representation outperforms the classical bag-of-words representation up to 157% in the single-label classification problem and up to 100% in the multi-label problem for OHSUMED corpus, and up to 122% in the single-label classification problem and up to 155% in the multi-label problem for UVigoMED corpus.

Embed

Wikipedia Quality

García, Marcos Mouriño; Rodríguez, Roberto Pérez; Rifón, Luis Anido. (2015). "[[Biomedical Literature Classification Using Encyclopedic Knowledge: a Wikipedia-Based Bag-Of-Concepts Approach]]". PeerJ Inc.. DOI: 10.7717/peerj.1279.

English Wikipedia

{{cite journal |last1=García |first1=Marcos Mouriño |last2=Rodríguez |first2=Roberto Pérez |last3=Rifón |first3=Luis Anido |title=Biomedical Literature Classification Using Encyclopedic Knowledge: a Wikipedia-Based Bag-Of-Concepts Approach |date=2015 |doi=10.7717/peerj.1279 |url=https://wikipediaquality.com/wiki/Biomedical_Literature_Classification_Using_Encyclopedic_Knowledge:_a_Wikipedia-Based_Bag-Of-Concepts_Approach |journal=PeerJ Inc.}}

HTML

García, Marcos Mouriño; Rodríguez, Roberto Pérez; Rifón, Luis Anido. (2015). &quot;<a href="https://wikipediaquality.com/wiki/Biomedical_Literature_Classification_Using_Encyclopedic_Knowledge:_a_Wikipedia-Based_Bag-Of-Concepts_Approach">Biomedical Literature Classification Using Encyclopedic Knowledge: a Wikipedia-Based Bag-Of-Concepts Approach</a>&quot;. PeerJ Inc.. DOI: 10.7717/peerj.1279.