Difference between revisions of "Clustering Documents with Active Learning Using Wikipedia"
(+ Infobox work) |
(+ embed code) |
||
Line 10: | Line 10: | ||
== Overview == | == Overview == | ||
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper authors propose to exploit the [[semantic knowledge]] in [[Wikipedia]] for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. Authors first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. Authors then exploit the semantic [[relatedness]] between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. Authors test approach on three standard text document datasets. Empirical results show that basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%. | Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper authors propose to exploit the [[semantic knowledge]] in [[Wikipedia]] for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. Authors first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. Authors then exploit the semantic [[relatedness]] between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. Authors test approach on three standard text document datasets. Empirical results show that basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%. | ||
+ | |||
+ | == Embed == | ||
+ | === Wikipedia Quality === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2008). "[[Clustering Documents with Active Learning Using Wikipedia]]".DOI: 10.1109/ICDM.2008.80. | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === English Wikipedia === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | {{cite journal |last1=Huang |first1=Anna-Lan |last2=Milne |first2=David N. |last3=Frank |first3=Eibe |last4=Witten |first4=Ian H. |title=Clustering Documents with Active Learning Using Wikipedia |date=2008 |doi=10.1109/ICDM.2008.80 |url=https://wikipediaquality.com/wiki/Clustering_Documents_with_Active_Learning_Using_Wikipedia}} | ||
+ | </nowiki> | ||
+ | </code> | ||
+ | |||
+ | === HTML === | ||
+ | <code> | ||
+ | <nowiki> | ||
+ | Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Clustering_Documents_with_Active_Learning_Using_Wikipedia">Clustering Documents with Active Learning Using Wikipedia</a>&quot;.DOI: 10.1109/ICDM.2008.80. | ||
+ | </nowiki> | ||
+ | </code> |
Revision as of 08:41, 24 March 2021
Authors | Anna-Lan Huang David N. Milne Eibe Frank Ian H. Witten |
---|---|
Publication date | 2008 |
DOI | 10.1109/ICDM.2008.80 |
Links | Original |
Clustering Documents with Active Learning Using Wikipedia - scientific work related to Wikipedia quality published in 2008, written by Anna-Lan Huang, David N. Milne, Eibe Frank and Ian H. Witten.
Overview
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper authors propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. Authors first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. Authors then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. Authors test approach on three standard text document datasets. Empirical results show that basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%.
Embed
Wikipedia Quality
Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2008). "[[Clustering Documents with Active Learning Using Wikipedia]]".DOI: 10.1109/ICDM.2008.80.
English Wikipedia
{{cite journal |last1=Huang |first1=Anna-Lan |last2=Milne |first2=David N. |last3=Frank |first3=Eibe |last4=Witten |first4=Ian H. |title=Clustering Documents with Active Learning Using Wikipedia |date=2008 |doi=10.1109/ICDM.2008.80 |url=https://wikipediaquality.com/wiki/Clustering_Documents_with_Active_Learning_Using_Wikipedia}}
HTML
Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2008). "<a href="https://wikipediaquality.com/wiki/Clustering_Documents_with_Active_Learning_Using_Wikipedia">Clustering Documents with Active Learning Using Wikipedia</a>".DOI: 10.1109/ICDM.2008.80.