Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud

From Wikipedia Quality
Jump to: navigation, search


Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud
Authors
Rui Máximo Esteves
Chunming Rong
Publication date
2011
DOI
10.1109/CloudCom.2011.86
Links
Original

Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud - scientific work related to Wikipedia quality published in 2011, written by Rui Máximo Esteves and Chunming Rong.

Overview

This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. Authors made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and Wikipedia's latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research authors found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. Authors found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From experience the use of the Apache Mahout is premature.

Embed

Wikipedia Quality

Esteves, Rui Máximo; Rong, Chunming. (2011). "[[Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud]]".DOI: 10.1109/CloudCom.2011.86.

English Wikipedia

{{cite journal |last1=Esteves |first1=Rui Máximo |last2=Rong |first2=Chunming |title=Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud |date=2011 |doi=10.1109/CloudCom.2011.86 |url=https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud}}

HTML

Esteves, Rui Máximo; Rong, Chunming. (2011). &quot;<a href="https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud">Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud</a>&quot;.DOI: 10.1109/CloudCom.2011.86.