Home → Techniques and Tips → StatTools → Cluster Analysis Methodology

17.10. Cluster Analysis Methodology

Applies to:
StatTools 7.x

What methodology is used by StatTools cluster analysis?

StatTools provides Hierarchical Agglomerative Clustering (HAC).

This procedure starts with each object representing an individual cluster, and then these clusters are sequentially merged according to their similarity. Similarity is achieved by use of an appropriate metric (a measure of distance between pairs of observations), and a linkage criterion which specifies the similarity of clusters as a function of the pairwise distances of observations in the clusters. The similarity s_ij between two clusters is given by

s_ij = 100 · (1 − d_ij/d_max)

where d_max is the maximum value in the original distance matrix D.

StatTools offers these linkage methods and metrics:

Linkage methods (labeled Agglomerative Method in the dialog): Single (Nearest Neighbor), Complete (Farthest Neighbor), Average, Centroid, Median, Ward.
See the StatTools help topic "Cluster Analysis Dialog—Clustering Settings Tab" for definitions of these.
Metrics (labeled Distance Measure):
- For observations: Euclidean, Squared Euclidean, Mahalanobis, Manhattan.
- For variables: Correlation, Absolute Correlation.
For details on the distance measures, please see the attached Word document.

The choice of metric or linkage method will influence the final number of clusters. Therefore, you may need to spend some time looking at your data set and choosing an appropriate metric and linkage. If in doubt, you might perhaps try different approaches and compare the results.

Last edited: 2018-05-08

Downloads

doc
KB1628_DistanceMeasures.docx

This page was: Helpful | Not Helpful