An ensemble is a collection of multiple decision trees which are combined to create a stronger model with better predictive performance. An ensemble of models built on samples of the data can become a powerful predictor by averaging away the errors of each individual model.
Generally, ensembles perform better than a single decision tree because they are less sensitive to outliers in your training data. One of the pitfalls of Machine Learning is that the algorithm has the potential to overfit your data, so that its performance on your training data is very good, but it does not generalize well to new data. By learning multiple models over different subsamples of your data and taking a majority vote at prediction time, ensembles mitigate the risk of overfitting a single model to all of the data.
To learn more about ensembles, please:
- Watch this video for a gentle introduction to ensembles. This demo showcases what ensembles are, how to create them and interpret the results.
- Read the chapter 2 of BigML's Dashboard documentation about ensembles, or check the API documentation for more details on the arguments and properties of an ensemble. The documentation explains the differences between the three ensemble-based strategies that BigML offers (Bagging or Bootstrap Aggregating, Random Decision Forests, and Boosted Trees).
- Read this blog post for more details.
- Check this related question to continue learning about other predictive models: Which models does BigML work with?