Clustering Documents Using a Wikipedia-Based Concept Representation

From Wikipedia Quality
Jump to: navigation, search


Clustering Documents Using a Wikipedia-Based Concept Representation
Authors
Anna-Lan Huang
David N. Milne
Eibe Frank
Ian H. Witten
Publication date
2009
DOI
10.1007/978-3-642-01307-2_62
Links
Original Preprint

Clustering Documents Using a Wikipedia-Based Concept Representation - scientific work related to Wikipedia quality published in 2009, written by Anna-Lan Huang, David N. Milne, Eibe Frank and Ian H. Witten.

Overview

This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. Authors first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. Authors also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. Authors test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, approach already improves upon related techniques.

Embed

Wikipedia Quality

Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2009). "[[Clustering Documents Using a Wikipedia-Based Concept Representation]]". Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-01307-2_62.

English Wikipedia

{{cite journal |last1=Huang |first1=Anna-Lan |last2=Milne |first2=David N. |last3=Frank |first3=Eibe |last4=Witten |first4=Ian H. |title=Clustering Documents Using a Wikipedia-Based Concept Representation |date=2009 |doi=10.1007/978-3-642-01307-2_62 |url=https://wikipediaquality.com/wiki/Clustering_Documents_Using_a_Wikipedia-Based_Concept_Representation |journal=Springer, Berlin, Heidelberg}}

HTML

Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2009). &quot;<a href="https://wikipediaquality.com/wiki/Clustering_Documents_Using_a_Wikipedia-Based_Concept_Representation">Clustering Documents Using a Wikipedia-Based Concept Representation</a>&quot;. Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-01307-2_62.