Top Three Clustering Algorithms You Should Know Instead of K-means Clustering

A comprehensive guide to industry leading clustering techniques

Terence Shin, MSc, MBA

--

Photo by Mel Poole on Unsplash

K-means clustering is arguably one of the most commonly used clustering techniques in the world of data science (anecdotally speaking), and for good reason. It’s simple to understand, easy to implement, and is computationally efficient.

However, there are several limitations of k-means clustering which hinders its ability to be a strong clustering technique:

  • K-means clustering assumes that the data points are distributed in a spherical shape, which may not always be the case in real-world data sets. This can lead to suboptimal cluster assignments and poor performance on non-spherical data.
  • K-means clustering requires the user to specify the number of clusters in advance, which can be difficult to do accurately in many cases. If the number of clusters is not specified correctly, the algorithm may not be able to identify the underlying structure of the data.
  • K-means clustering is sensitive to the presence of outliers and noise in the data, which can cause the clusters to be distorted or split into multiple clusters.
  • K-means clustering is not well-suited for data sets with uneven cluster sizes or non-linearly separable data, as it may be unable to identify the underlying structure of the data in these cases.

And so in this article, I wanted to talk about three clustering techniques that you should know as alternatives to k-means clustering:

  1. DBSCAN
  2. Hierarchical Clustering
  3. Spectral Clustering

Enjoying this article? Subscribe and become a member today to never miss another article on data science guides, tricks and tips, life lessons, and more!

1. DBSCAN

What is DBSCAN?

DBSCAN is a clustering algorithm that groups data points into clusters based on the density of the points.

The algorithm works by identifying points that are in high-density regions of the data and expanding those clusters to include all points that are nearby…

--

--