Difference between revisions of "Using Wikipedia for Co-Clustering based Cross-Domain Text Classification"

From Wikipedia Quality
Jump to: navigation, search
(+ wikilinks)
(infobox)
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Using Wikipedia for Co-Clustering based Cross-Domain Text Classification
 +
| date = 2008
 +
| authors = [[Pu Wang]]<br />[[Carlotta Domeniconi]]<br />[[Jian Hu]]
 +
| doi = 10.1109/ICDM.2008.136
 +
| link = http://dl.acm.org/citation.cfm?id=1510528.1511383
 +
}}
 
'''Using Wikipedia for Co-Clustering based Cross-Domain Text Classification''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Pu Wang]], [[Carlotta Domeniconi]] and [[Jian Hu]].
 
'''Using Wikipedia for Co-Clustering based Cross-Domain Text Classification''' - scientific work related to [[Wikipedia quality]] published in 2008, written by [[Pu Wang]], [[Carlotta Domeniconi]] and [[Jian Hu]].
  
 
== Overview ==
 
== Overview ==
 
Traditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain. Given a learning task for which training data are not available, abundant labeled data may exist for a different but related domain. One would like to use the related labeled data as auxiliary information to accomplish the classification task in the target domain. Recently, the paradigm of transfer learning has been introduced to enable effective learning strategies when auxiliary data obey a different probability distribution. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, authors extend the idea underlying this approach by making the latent semantic relationship between the two domains explicit. This goal is achieved with the use of [[Wikipedia]]. As a result, the pathway that allows to propagate labels between the two domains not only captures common words, but also semantic concepts based on the content of documents. Authors empirically demonstrate the efficacy of semantic-based approach to cross-domain classification using a variety of real data.
 
Traditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain. Given a learning task for which training data are not available, abundant labeled data may exist for a different but related domain. One would like to use the related labeled data as auxiliary information to accomplish the classification task in the target domain. Recently, the paradigm of transfer learning has been introduced to enable effective learning strategies when auxiliary data obey a different probability distribution. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, authors extend the idea underlying this approach by making the latent semantic relationship between the two domains explicit. This goal is achieved with the use of [[Wikipedia]]. As a result, the pathway that allows to propagate labels between the two domains not only captures common words, but also semantic concepts based on the content of documents. Authors empirically demonstrate the efficacy of semantic-based approach to cross-domain classification using a variety of real data.

Revision as of 13:04, 2 November 2019


Using Wikipedia for Co-Clustering based Cross-Domain Text Classification
Authors
Pu Wang
Carlotta Domeniconi
Jian Hu
Publication date
2008
DOI
10.1109/ICDM.2008.136
Links
Original

Using Wikipedia for Co-Clustering based Cross-Domain Text Classification - scientific work related to Wikipedia quality published in 2008, written by Pu Wang, Carlotta Domeniconi and Jian Hu.

Overview

Traditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain. Given a learning task for which training data are not available, abundant labeled data may exist for a different but related domain. One would like to use the related labeled data as auxiliary information to accomplish the classification task in the target domain. Recently, the paradigm of transfer learning has been introduced to enable effective learning strategies when auxiliary data obey a different probability distribution. A co-clustering based classification algorithm has been previously proposed to tackle cross-domain text classification. In this work, authors extend the idea underlying this approach by making the latent semantic relationship between the two domains explicit. This goal is achieved with the use of Wikipedia. As a result, the pathway that allows to propagate labels between the two domains not only captures common words, but also semantic concepts based on the content of documents. Authors empirically demonstrate the efficacy of semantic-based approach to cross-domain classification using a variety of real data.