Machine Learning and especially unsupervised Machine Learning is like a box of chocolates so you never know exactly what you will get. That is part of the fun in exploring a dataset. That means you will need to employ an iterative approach here. You can be as creative as you like in iterating on your associations model. Here are some tips:
- Iterate your model by changing parameters. One of the parameters that can yield different results is the search strategy to be used, i.e. the selected measure to prioritize the associations discovered. You can use leverage, lift, coverage, support and confidence metrics so that rules with higher values for the chosen measure will float up to the top of your result set. Leverage is one of the measures that usually gives relevant and interesting results in most of the cases. Two other measures frequently used are confidence and lift. However you should take into account that there is no surefire rule-of-thumb that can be applicable in all cases. You should select the best strategy according to your main purpose based on your domain knowledge and iterate until you arrive at a satisfactory result.
- Set minimum thresholds for measures according to your main goals. For example you can also specify a minimum support so you can get rid off insignificant rules in your dataset. This lets you obtain the rules that are more relevant according to your needs.
- Modify your dataset by applying feature engineering.
- Stratification is key, try to stratify your data as much as possible. For example, market basket/POS (Point of Sale) data from a supermarket chain can be grouped per store and even per season, while medical records from various patients can be stratified according to confounding factors such as age and gender. By segmenting your data you will also prevent what’s commonly known as the Simpson's Paradox.
- Remove categories or groups of items that are returning too many obvious rules. That way other more interesting patterns can emerge on top of your results.
- Remove anomalies to further clean your dataset. Although association rules shouldn’t be affected by outliers, cleaning your dataset can always improve the model as a general principle.