Difference between revisions of "Clustering Documents with Active Learning Using Wikipedia"

From Wikipedia Quality
Jump to: navigation, search
(+ embed code)
(Category)
 
Line 32: Line 32:
 
</nowiki>
 
</nowiki>
 
</code>
 
</code>
 +
 +
 +
 +
[[Category:Scientific works]]

Latest revision as of 23:50, 13 April 2021


Clustering Documents with Active Learning Using Wikipedia
Authors
Anna-Lan Huang
David N. Milne
Eibe Frank
Ian H. Witten
Publication date
2008
DOI
10.1109/ICDM.2008.80
Links
Original

Clustering Documents with Active Learning Using Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Anna-Lan Huang, David N. Milne, Eibe Frank and Ian H. Witten.

Overview

Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper authors propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. Authors first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. Authors then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. Authors test approach on three standard text document datasets. Empirical results show that basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%.

Embed

Wikipedia Quality

Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2008). "[[Clustering Documents with Active Learning Using Wikipedia]]".DOI: 10.1109/ICDM.2008.80.

English Wikipedia

{{cite journal |last1=Huang |first1=Anna-Lan |last2=Milne |first2=David N. |last3=Frank |first3=Eibe |last4=Witten |first4=Ian H. |title=Clustering Documents with Active Learning Using Wikipedia |date=2008 |doi=10.1109/ICDM.2008.80 |url=https://wikipediaquality.com/wiki/Clustering_Documents_with_Active_Learning_Using_Wikipedia}}

HTML

Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Clustering_Documents_with_Active_Learning_Using_Wikipedia">Clustering Documents with Active Learning Using Wikipedia</a>&quot;.DOI: 10.1109/ICDM.2008.80.