Cluster analysis is an unsupervised Machine Learning task that partitions a dataset and groups together those instances that are similar. It separates a set of instances into a number of groups so that instances in the same group, called cluster, are more similar to each other than to those in other groups. Cluster analysis does not require using previously labeled data. For this reason, it falls under the category of unsupervised learning. This task is commonly used for market and customer segmentation, portfolio management, and for creating new features from your data while understanding its underlying structure.
BigML clusters can be built using two different unsupervised learning algorithms:
- K-means: you will need to specify the number of clusters (k) in advance.
- G-means: the algorithm automatically learns the number of different clusters by iteratively taking existing cluster groups and testing whether the cluster’s neighborhood appears Gaussian in its distribution.
Watch this video to learn how to easily separate your data into groups of similar instances using BigML Clusters through the BigML Dashboard:
For more details, please find the documentation about Clusters here if you are using the Dashboard, or here if you prefer the API.