Difference between revisions of "Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud"

Revision as of 01:13, 15 January 2021

Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud
Authors	Rui Máximo Esteves Chunming Rong
Publication date	2011
DOI	10.1109/CloudCom.2011.86
Links	Original

Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud - scientific work related to Wikipedia quality published in 2011, written by Rui Máximo Esteves and Chunming Rong.

Overview

This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. Authors made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and Wikipedia's latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research authors found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. Authors found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From experience the use of the Apache Mahout is premature.

Embed

Wikipedia Quality

Esteves, Rui Máximo; Rong, Chunming. (2011). "[[Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud]]".DOI: 10.1109/CloudCom.2011.86.

English Wikipedia

{{cite journal |last1=Esteves |first1=Rui Máximo |last2=Rong |first2=Chunming |title=Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud |date=2011 |doi=10.1109/CloudCom.2011.86 |url=https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud}}

HTML

Esteves, Rui Máximo; Rong, Chunming. (2011). "<a href="https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud">Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud</a>".DOI: 10.1109/CloudCom.2011.86.

@@ Line 10: / Line 10: @@
 == Overview ==
 This paper compares k-means and fuzzy c-means for clustering a noisy realistic and big dataset. Authors made the comparison using a free cloud computing solution Apache Mahout/ Hadoop and [[Wikipedia]]'s latest articles. In the past the usage of these two algorithms was restricted to small datasets. As so, studies were based on artificial datasets that do not represent a real document clustering situation. With this ongoing research authors found that in a noisy dataset, fuzzy c-means can lead to worse cluster quality than k-means. The convergence speed of k-means is not always faster. Authors found as well that Mahout is a promise clustering technology but the preprocessing tools are not developed enough for an efficient dimensionality reduction. From experience the use of the Apache Mahout is premature.
+== Embed ==
+=== Wikipedia Quality ===
+<code>
+<nowiki>
+Esteves, Rui Máximo; Rong, Chunming. (2011). "[[Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud]]".DOI: 10.1109/CloudCom.2011.86.
+</nowiki>
+</code>
+=== English Wikipedia ===
+<code>
+<nowiki>
+{{cite journal |last1=Esteves |first1=Rui Máximo |last2=Rong |first2=Chunming |title=Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud |date=2011 |doi=10.1109/CloudCom.2011.86 |url=https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud}}
+</nowiki>
+</code>
+=== HTML ===
+<code>
+<nowiki>
+Esteves, Rui Máximo; Rong, Chunming. (2011). &amp;quot;<a href="https://wikipediaquality.com/wiki/Using_Mahout_for_Clustering_Wikipedia's_Latest_Articles:_a_Comparison_Between_K-Means_and_Fuzzy_C-Means_in_the_Cloud">Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud</a>&amp;quot;.DOI: 10.1109/CloudCom.2011.86.
+</nowiki>
+</code>

Difference between revisions of "Using Mahout for Clustering Wikipedia's Latest Articles: a Comparison Between K-Means and Fuzzy C-Means in the Cloud"

Revision as of 01:13, 15 January 2021

Contents

Overview

Embed

Wikipedia Quality

English Wikipedia

HTML

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools