Clustering Documents Using a Wikipedia-Based Concept Representation
Authors | Anna-Lan Huang David N. Milne Eibe Frank Ian H. Witten |
---|---|
Publication date | 2009 |
DOI | 10.1007/978-3-642-01307-2_62 |
Links | Original Preprint |
Clustering Documents Using a Wikipedia-Based Concept Representation - scientific work related to Wikipedia quality published in 2009, written by Anna-Lan Huang, David N. Milne, Eibe Frank and Ian H. Witten.
Overview
This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. Authors first create a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles (or concepts) in Wikipedia. Authors also developed a similarity measure that evaluates the semantic relatedness between concept sets for two documents. Authors test the concept-based representation and the similarity measure on two standard text document datasets. Empirical results show that although further optimizations could be performed, approach already improves upon related techniques.
Embed
Wikipedia Quality
Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2009). "[[Clustering Documents Using a Wikipedia-Based Concept Representation]]". Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-01307-2_62.
English Wikipedia
{{cite journal |last1=Huang |first1=Anna-Lan |last2=Milne |first2=David N. |last3=Frank |first3=Eibe |last4=Witten |first4=Ian H. |title=Clustering Documents Using a Wikipedia-Based Concept Representation |date=2009 |doi=10.1007/978-3-642-01307-2_62 |url=https://wikipediaquality.com/wiki/Clustering_Documents_Using_a_Wikipedia-Based_Concept_Representation |journal=Springer, Berlin, Heidelberg}}
HTML
Huang, Anna-Lan; Milne, David N.; Frank, Eibe; Witten, Ian H.. (2009). "<a href="https://wikipediaquality.com/wiki/Clustering_Documents_Using_a_Wikipedia-Based_Concept_Representation">Clustering Documents Using a Wikipedia-Based Concept Representation</a>". Springer, Berlin, Heidelberg. DOI: 10.1007/978-3-642-01307-2_62.