Why Clustering Fails and How to Fix It | Ryan Feathers | July 2024

Machine Learning

And how to fix it

Ryan Feather
Towards Data Science

So you had trouble interpreting your data, so you tried clustering. Now you have trouble interpreting your clusters. You suspected that there might be patterns in the data. Naturally, you hoped that adding some structure via unsupervised learning would give you some insight. Clusters are the go-to tool for finding structure. So you set off on a journey. You spent a lot of money on compute. You spent a lot of effort fiddling with cluster tuning parameters. You tried a few algorithms just to be sure. But in the end, you're left with a rainbow-colored plot of clustered data that, if you squint your eyes and look closely, maybe has some meaning. You go home with the uneasy suspicion that it was all for nothing. Sadly, this is common. But why?

Some real clusters. Images released in the public domain by NASA and STScI.

Failure to deliver value in clustering projects is often due to a number of reasons, including a lack of understanding of the data, a lack of attention to the desired outcome, and poor tool selection. Let's look at these in turn. To move the discussion forward, it's useful to understand why clustering methods exist. To do this, we'll review what clustering is and some of the issues that prompted their development.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *