Difference between revisions of "Automatic Keyphrase Annotation of Scientific Documents Using Wikipedia and Genetic Algorithms"

From Wikipedia Quality
Jump to: navigation, search
(Overview: Automatic Keyphrase Annotation of Scientific Documents Using Wikipedia and Genetic Algorithms)
 
(Adding wikilinks)
Line 1: Line 1:
'''Automatic Keyphrase Annotation of Scientific Documents Using Wikipedia and Genetic Algorithms''' - scientific work related to Wikipedia quality published in 2013, written by Arash Joorabchi and Abdulhussain E. Mahdi.
+
'''Automatic Keyphrase Annotation of Scientific Documents Using Wikipedia and Genetic Algorithms''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Arash Joorabchi]] and [[Abdulhussain E. Mahdi]].
  
 
== Overview ==
 
== Overview ==
Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents' content. Authors have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. Authors first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. Authors have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.
+
Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and [[information retrieval]] systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes [[Wikipedia]] as a thesaurus for candidate selection from documents' content. Authors have devised a set of 20 statistical, positional and semantical [[features]] for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. Authors first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. Authors have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.

Revision as of 06:42, 21 February 2021

Automatic Keyphrase Annotation of Scientific Documents Using Wikipedia and Genetic Algorithms - scientific work related to Wikipedia quality published in 2013, written by Arash Joorabchi and Abdulhussain E. Mahdi.

Overview

Topical annotation of documents with keyphrases is a proven method for revealing the subject of scientific and research documents to both human readers and information retrieval systems. This article describes a machine learning-based keyphrase annotation method for scientific documents that utilizes Wikipedia as a thesaurus for candidate selection from documents' content. Authors have devised a set of 20 statistical, positional and semantical features for candidate phrases to capture and reflect various properties of those candidates that have the highest keyphraseness probability. Authors first introduce a simple unsupervised method for ranking and filtering the most probable keyphrases, and then evolve it into a novel supervised method using genetic algorithms. Authors have evaluated the performance of both methods on a third-party dataset of research papers. Reported experimental results show that the performance of proposed methods, measured in terms of consistency with human annotators, is on a par with that achieved by humans and outperforms rival supervised and unsupervised methods.