If dividing your dataset into several groups may help you, then clustering is what you need. Clusters are used to split your data into similar groups to better analyze and explore those groups individually, and/or filter your data before training a model. These groups are calculated according to a distance measure between the instance. Each cluster is represented by a centroid computed using the mean for each numeric field and the mode for each categorical field.
Sample use cases include:
- Unusual Instance Discovery or Item Discovery
- Fraud Detection
- Identifying Incorrect Data
- Removing Outliers
(The above are also possible with the BigML Anomaly Detector)
- Customer and Market Segmentation
- Portfolio Management
- Active Learning for Disease Diagnoses
You can read the Dashboard documentation and the equivalent for developers for more info, along with this blog post and this video.