Semantic Smoothing for Text Clustering

From Wikipedia Quality
Jump to: navigation, search
Semantic Smoothing for Text Clustering
Authors
Jamal Abdul Nasir
Iraklis Varlamis
Asim Karim
George Tsatsaronis
Publication date
2013
ISSN
09507051
DOI
10.1016/j.knosys.2013.09.012
Links

Semantic Smoothing for Text Clustering - scientific work about Wikipedia quality published in 2013, written by Jamal Abdul Nasir, Iraklis Varlamis, Asim Karim and George Tsatsaronis.

Overview

In this paper authors present a new semantic smoothing vector space kernel (S-VSM) for text documents clustering. In the suggested approach semantic relatedness between words is used to smooth the similarity and the representation of text documents. The basic hypothesis examined is that considering semantic relatedness between two text documents may improve the performance of the text document clustering task. For their experimental evaluation authors analyze the performance of several semantic relatedness measures when embedded in the proposed (S-VSM) and present results with respect to different experimental conditions, such as: (i) the datasets used, (ii) the underlying knowledge sources of the utilized measures, and (iii) the clustering algorithms employed. To the best of their knowledge, the current study is the first to systematically compare, analyze and evaluate the impact of semantic smoothing in text clustering based on 'wisdom of linguists', e.g., WordNets, 'wisdom of crowds', e.g., Wikipedia, and 'wisdom of corpora', e.g., large text corpora represented with the traditional Bag of Words (BoW) model. Three semantic relatedness measures for text are considered; two knowledge-based (Omiotis [1] that uses WordNet, and WLM [2] that uses Wikipedia), and one corpus-based (PMI [3] trained on a semantically tagged SemCor version). For the comparison of different experimental conditions authors use the BCubed F-Measure evaluation metric which satisfies all formal constraints of good quality cluster. The experimental results show that the clustering performance based on the S-VSM is better compared to the traditional VSM model and compares favorably against the standard GVSM kernel which uses word co-occurrences to compute the latent similarities between document terms. © 2013 Elsevier B.V. All rights reserved.

Embed

Wikipedia Quality

Nasir, Jamal Abdul; Varlamis, Iraklis; Karim, Asim; Tsatsaronis, George. (2013). "[[Semantic Smoothing for Text Clustering]]". Knowledge-Based Systems Volume 54, December 2013, pp. 216-229. ISSN: 09507051. DOI: 10.1016/j.knosys.2013.09.012.

English Wikipedia

{{cite journal |last1=Nasir |first1=Jamal Abdul |last2=Varlamis |first2=Iraklis |last3=Karim |first3=Asim |last4=Tsatsaronis |first4=George |title=Semantic Smoothing for Text Clustering |date=2013 |issn=09507051 |doi=10.1016/j.knosys.2013.09.012 |url=https://wikipediaquality.com/wiki/Semantic_Smoothing_for_Text_Clustering |journal=Knowledge-Based Systems Volume 54, December 2013, pp. 216-229}}

HTML

Nasir, Jamal Abdul; Varlamis, Iraklis; Karim, Asim; Tsatsaronis, George. (2013). &quot;<a href="https://wikipediaquality.com/wiki/Semantic_Smoothing_for_Text_Clustering">Semantic Smoothing for Text Clustering</a>&quot;. Knowledge-Based Systems Volume 54, December 2013, pp. 216-229. ISSN: 09507051. DOI: 10.1016/j.knosys.2013.09.012.