Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents

From Wikipedia Quality
Jump to: navigation, search


Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents
Authors
Gerasimos Spanakis
Georgios Siolas
Andreas Stafylopatis
Publication date
2012
DOI
10.1093/comjnl/bxr024
Links
Original

Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents - scientific work related to Wikipedia quality published in 2012, written by Gerasimos Spanakis, Georgios Siolas and Andreas Stafylopatis.

Overview

In this paper, authors propose a novel method for conceptual hierarchical clustering of documents using knowledge extracted from Wikipedia. The proposed method overcomes the classic bag-of-words models disadvantages through the exploitation of Wikipedia textual content and link structure. A robust and compact document representation is built in real-time using the Wikipedia application programmer's interface, without the need to store locally any Wikipedia information. The clustering process is hierarchical and extends the idea of frequent items by using Wikipedia article titles for selecting cluster labels that are descriptive and important for the examined corpus. Experiments show that the proposed technique greatly improves over the baseline approach, both in terms of F-measure and entropy on the one hand and computational cost on the other.

Embed

Wikipedia Quality

Spanakis, Gerasimos; Siolas, Georgios; Stafylopatis, Andreas. (2012). "[[Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents]]". Oxford University Press. DOI: 10.1093/comjnl/bxr024.

English Wikipedia

{{cite journal |last1=Spanakis |first1=Gerasimos |last2=Siolas |first2=Georgios |last3=Stafylopatis |first3=Andreas |title=Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents |date=2012 |doi=10.1093/comjnl/bxr024 |url=https://wikipediaquality.com/wiki/Exploiting_Wikipedia_Knowledge_for_Conceptual_Hierarchical_Clustering_of_Documents |journal=Oxford University Press}}

HTML

Spanakis, Gerasimos; Siolas, Georgios; Stafylopatis, Andreas. (2012). &quot;<a href="https://wikipediaquality.com/wiki/Exploiting_Wikipedia_Knowledge_for_Conceptual_Hierarchical_Clustering_of_Documents">Exploiting Wikipedia Knowledge for Conceptual Hierarchical Clustering of Documents</a>&quot;. Oxford University Press. DOI: 10.1093/comjnl/bxr024.