Difference between revisions of "Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud"

From Wikipedia Quality
Jump to: navigation, search
(infobox)
(+ Embed)
Line 10: Line 10:
 
== Overview ==
 
== Overview ==
 
This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. Authors made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and [[Wikipedia]]'s latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research authors found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. Authors found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From experience the use of the Apache Mahout is premature.
 
This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. Authors made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and [[Wikipedia]]'s latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research authors found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. Authors found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From experience the use of the Apache Mahout is premature.
 +
 +
== Embed ==
 +
=== Wikipedia Quality ===
 +
<code>
 +
<nowiki>
 +
Esteves, Rui Máximo; Rong, Chunming. (2011). "[[Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud]]".DOI: 10.1109/CloudCom.2011.86.
 +
</nowiki>
 +
</code>
 +
 +
=== English Wikipedia ===
 +
<code>
 +
<nowiki>
 +
{{cite journal |last1=Esteves |first1=Rui Máximo |last2=Rong |first2=Chunming |title=Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud |date=2011 |doi=10.1109/CloudCom.2011.86 |url=https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud}}
 +
</nowiki>
 +
</code>
 +
 +
=== HTML ===
 +
<code>
 +
<nowiki>
 +
Esteves, Rui Máximo; Rong, Chunming. (2011). &amp;quot;<a href="https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud">Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud</a>&amp;quot;.DOI: 10.1109/CloudCom.2011.86.
 +
</nowiki>
 +
</code>

Revision as of 01:13, 15 January 2021


Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud
Authors
Rui Máximo Esteves
Chunming Rong
Publication date
2011
DOI
10.1109/CloudCom.2011.86
Links
Original

Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud - scientific work related to Wikipedia quality published in 2011, written by Rui Máximo Esteves and Chunming Rong.

Overview

This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. Authors made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and Wikipedia's latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research authors found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. Authors found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From experience the use of the Apache Mahout is premature.

Embed

Wikipedia Quality

Esteves, Rui Máximo; Rong, Chunming. (2011). "[[Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud]]".DOI: 10.1109/CloudCom.2011.86.

English Wikipedia

{{cite journal |last1=Esteves |first1=Rui Máximo |last2=Rong |first2=Chunming |title=Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud |date=2011 |doi=10.1109/CloudCom.2011.86 |url=https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud}}

HTML

Esteves, Rui Máximo; Rong, Chunming. (2011). &quot;<a href="https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud">Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud</a>&quot;.DOI: 10.1109/CloudCom.2011.86.