Hierarchical clustering
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:
Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.
The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of
) are known: SLINK for single-linkage and CLINK for complete-linkage clustering. With a heap the runtime of the general case can be reduced to
Except for the special case of single-linkage, none of the algorithms (except exhaustive search in
) can be guaranteed to find the optimum solution.
Divisive clustering with an exhaustive search is
, but it is common to use faster heuristics to choose splits, such as k-means.