An Ensemble Approach for Text Document Clustering Using Wikipedia Concepts

From Wikipedia Quality
Revision as of 09:26, 4 September 2019 by Ariel (talk | contribs) (+ embed code)
Jump to: navigation, search


An Ensemble Approach for Text Document Clustering Using Wikipedia Concepts
Authors
Seyednaser Nourashrafeddin
Evangelos E. Milios
Dirk V. Arnold
Publication date
2014
DOI
10.1145/2644866.2644868
Links
Original

An Ensemble Approach for Text Document Clustering Using Wikipedia Concepts - scientific work related to Wikipedia quality published in 2014, written by Seyednaser Nourashrafeddin, Evangelos E. Milios and Dirk V. Arnold.

Overview

Most text clustering algorithms represent a corpus as a document-term matrix in the bag of words model. The feature values are computed based on term frequencies in documents and no semantic relatedness between terms is considered. Therefore, two semantically similar documents may sit in different clusters if they do not share any terms. One solution to this problem is to enrich the document representation using an external resource like Wikipedia. Authors propose a new way to integrate Wikipedia concepts in partitional text document clustering in this work. A text corpus is first represented as a document-term matrix and a document-concept matrix. Terms that exist in the corpus are then clustered based on the document-term representation. Given the term clusters, authors propose two methods, one based on the document-term representation and the other one based on the document-concept representation, to find two sets of seed documents. The two sets are then used in text clustering algorithm in an ensemble approach to cluster documents. The experimental results show that even though the document-concept representations do not result in good document clusters per se, integrating them in ensemble approach improves the quality of document clusters significantly.

Embed

Wikipedia Quality

Nourashrafeddin, Seyednaser; Milios, Evangelos E.; Arnold, Dirk V.. (2014). "[[An Ensemble Approach for Text Document Clustering Using Wikipedia Concepts]]".DOI: 10.1145/2644866.2644868.

English Wikipedia

{{cite journal |last1=Nourashrafeddin |first1=Seyednaser |last2=Milios |first2=Evangelos E. |last3=Arnold |first3=Dirk V. |title=An Ensemble Approach for Text Document Clustering Using Wikipedia Concepts |date=2014 |doi=10.1145/2644866.2644868 |url=https://wikipediaquality.com/wiki/An_Ensemble_Approach_for_Text_Document_Clustering_Using_Wikipedia_Concepts}}

HTML

Nourashrafeddin, Seyednaser; Milios, Evangelos E.; Arnold, Dirk V.. (2014). &quot;<a href="https://wikipediaquality.com/wiki/An_Ensemble_Approach_for_Text_Document_Clustering_Using_Wikipedia_Concepts">An Ensemble Approach for Text Document Clustering Using Wikipedia Concepts</a>&quot;.DOI: 10.1145/2644866.2644868.