Difference between revisions of "Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents"

From Wikipedia Quality
Jump to: navigation, search
(Adding new article - Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents)
 
(Adding wikilinks)
Line 1: Line 1:
'''Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents''' - scientific work related to Wikipedia quality published in 2012, written by Gerasimos Spanakis, Georgios Siolas and Andreas Stafylopatis.
+
'''Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents''' - scientific work related to [[Wikipedia quality]] published in 2012, written by [[Gerasimos Spanakis]], [[Georgios Siolas]] and [[Andreas Stafylopatis]].
  
 
== Overview ==
 
== Overview ==
In this paper, authors propose a novel method for conceptual hierarchical clustering of documents using knowledge extracted from Wikipedia. The proposed method overcomes the classic bag-of-words models disadvantages through the exploitation of Wikipedia textual content and link structure. A robust and compact document representation is built in real-time using the Wikipedia application programmer's interface, without the need to store locally any Wikipedia information. The clustering process is hierarchical and extends the idea of frequent items by using Wikipedia article titles for selecting cluster labels that are descriptive and important for the examined corpus. Experiments show that the proposed technique greatly improves over the baseline approach, both in terms of F-measure and entropy on the one hand and computational cost on the other.
+
In this paper, authors propose a novel method for conceptual hierarchical clustering of documents using knowledge extracted from [[Wikipedia]]. The proposed method overcomes the classic bag-of-words models disadvantages through the exploitation of Wikipedia textual content and link structure. A robust and compact document representation is built in real-time using the Wikipedia application programmer's interface, without the need to store locally any Wikipedia information. The clustering process is hierarchical and extends the idea of frequent items by using Wikipedia article titles for selecting cluster labels that are descriptive and important for the examined corpus. Experiments show that the proposed technique greatly improves over the baseline approach, both in terms of F-measure and entropy on the one hand and computational cost on the other.

Revision as of 07:51, 6 June 2019

Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents - scientific work related to Wikipedia quality published in 2012, written by Gerasimos Spanakis, Georgios Siolas and Andreas Stafylopatis.

Overview

In this paper, authors propose a novel method for conceptual hierarchical clustering of documents using knowledge extracted from Wikipedia. The proposed method overcomes the classic bag-of-words models disadvantages through the exploitation of Wikipedia textual content and link structure. A robust and compact document representation is built in real-time using the Wikipedia application programmer's interface, without the need to store locally any Wikipedia information. The clustering process is hierarchical and extends the idea of frequent items by using Wikipedia article titles for selecting cluster labels that are descriptive and important for the examined corpus. Experiments show that the proposed technique greatly improves over the baseline approach, both in terms of F-measure and entropy on the one hand and computational cost on the other.