Difference between revisions of "Improving Semi-Supervised Text Classification by Using Wikipedia Knowledge"

From Wikipedia Quality
Jump to: navigation, search
(Creating a page: Improving Semi-Supervised Text Classification by Using Wikipedia Knowledge)
 
(+ wikilinks)
Line 1: Line 1:
'''Improving Semi-Supervised Text Classification by Using Wikipedia Knowledge''' - scientific work related to Wikipedia quality published in 2013, written by Zhilin Zhang, Huaizhong Lin, Pengfei Li, Huazhong Wang and Dongming Lu.
+
'''Improving Semi-Supervised Text Classification by Using Wikipedia Knowledge''' - scientific work related to [[Wikipedia quality]] published in 2013, written by [[Zhilin Zhang]], [[Huaizhong Lin]], [[Pengfei Li]], [[Huazhong Wang]] and [[Dongming Lu]].
  
 
== Overview ==
 
== Overview ==
Semi-supervised text classification uses both labeled and unlabeled data to construct classifiers. The key issue is how to utilize the unlabeled data. Clustering based classification method outperforms other semi-supervised text classification algorithms. However, its achievements are still limited because the vector space model representation largely ignores the semantic relationships between words. In this paper, authors propose a new approach to address this problem by using Wikipedia knowledge. Authors enrich document representation with Wikipedia semantic features (concepts and categories), propose a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Experiment results on several corpora show that proposed method can effectively improve semi-supervised text classification performance.
+
Semi-supervised text classification uses both labeled and unlabeled data to construct classifiers. The key issue is how to utilize the unlabeled data. Clustering based classification method outperforms other semi-supervised text classification algorithms. However, its achievements are still limited because the vector space model representation largely ignores the semantic relationships between words. In this paper, authors propose a new approach to address this problem by using [[Wikipedia]] knowledge. Authors enrich document representation with Wikipedia semantic [[features]] (concepts and [[categories]]), propose a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Experiment results on several corpora show that proposed method can effectively improve semi-supervised text classification performance.

Revision as of 21:54, 29 November 2019

Improving Semi-Supervised Text Classification by Using Wikipedia Knowledge - scientific work related to Wikipedia quality published in 2013, written by Zhilin Zhang, Huaizhong Lin, Pengfei Li, Huazhong Wang and Dongming Lu.

Overview

Semi-supervised text classification uses both labeled and unlabeled data to construct classifiers. The key issue is how to utilize the unlabeled data. Clustering based classification method outperforms other semi-supervised text classification algorithms. However, its achievements are still limited because the vector space model representation largely ignores the semantic relationships between words. In this paper, authors propose a new approach to address this problem by using Wikipedia knowledge. Authors enrich document representation with Wikipedia semantic features (concepts and categories), propose a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Experiment results on several corpora show that proposed method can effectively improve semi-supervised text classification performance.