Enhancing Text Clustering by Leveraging Wikipedia Semantics

From Wikipedia Quality
Revision as of 01:04, 17 February 2021 by Messalina (talk | contribs) (+ categories)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Enhancing Text Clustering by Leveraging Wikipedia Semantics
Authors
Jian Hu
Lujun Fang
Yang Cao
Hua-Jun Zeng
Hua Li
Qiang Yang
Zheng Chen
Publication date
2008
DOI
10.1145/1390334.1390367
Links
Original

Enhancing Text Clustering by Leveraging Wikipedia Semantics - scientific work related to Wikipedia quality published in 2008, written by Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang and Zheng Chen.

Overview

Most traditional text clustering methods are based on "bag of words" ( BOW ) representation based on frequency statistics in a set of documents. BOW , however, ignores the important information on the semantic relationships between key terms. To overcome this problem, several methods have been proposed to enrich text representation with external resource in the past, such as WordNet. However, many of these approaches suffer from some limitations: 1) WordNet has limited coverage and has a lack of effective word-sense disambiguation ability; 2) Most of the text representation enrichment strategies, which append or replace document terms with their hypernym and synonym, are overly simple. In this paper, to overcome these deficiencies, authors first propose a way to build a concept thesaurus based on the semantic relations (synonym, hypernym, and associative relation) extracted from Wikipedia. Then, authors develop a unified framework to leverage these semantic relations in order to enhance traditional content similarity measure for text clustering. The experimental results on Reuters and OHSUMED datasets show that with the help of Wikipedia thesaurus, the clustering performance of method is improved as compared to previous methods. In addition, with the optimized weights for hypernym, synonym, and associative concepts that are tuned with the help of a few labeled data users provided, the clustering performance can be further improved.

Embed

Wikipedia Quality

Hu, Jian; Fang, Lujun; Cao, Yang; Zeng, Hua-Jun; Li, Hua; Yang, Qiang; Chen, Zheng. (2008). "[[Enhancing Text Clustering by Leveraging Wikipedia Semantics]]".DOI: 10.1145/1390334.1390367.

English Wikipedia

{{cite journal |last1=Hu |first1=Jian |last2=Fang |first2=Lujun |last3=Cao |first3=Yang |last4=Zeng |first4=Hua-Jun |last5=Li |first5=Hua |last6=Yang |first6=Qiang |last7=Chen |first7=Zheng |title=Enhancing Text Clustering by Leveraging Wikipedia Semantics |date=2008 |doi=10.1145/1390334.1390367 |url=https://wikipediaquality.com/wiki/Enhancing_Text_Clustering_by_Leveraging_Wikipedia_Semantics}}

HTML

Hu, Jian; Fang, Lujun; Cao, Yang; Zeng, Hua-Jun; Li, Hua; Yang, Qiang; Chen, Zheng. (2008). &quot;<a href="https://wikipediaquality.com/wiki/Enhancing_Text_Clustering_by_Leveraging_Wikipedia_Semantics">Enhancing Text Clustering by Leveraging Wikipedia Semantics</a>&quot;.DOI: 10.1145/1390334.1390367.