Technology Encyclopedia Home >How does Bagging reduce the variance of the model?

How does Bagging reduce the variance of the model?

Bagging, short for Bootstrap Aggregating, reduces the variance of a model by combining predictions from multiple independently trained models. Variance in machine learning refers to how much a model's predictions change when it is trained on different subsets of the data. High variance can lead to overfitting, where the model performs well on training data but poorly on unseen data.

Bagging works by creating multiple subsets of the original dataset through bootstrapping—random sampling with replacement. Each subset is used to train a separate base model, typically of the same type (e.g., decision trees). Since each model is trained on a different subset of data, they make slightly different predictions. The final prediction is obtained by aggregating the predictions of all base models, usually through averaging (for regression) or majority voting (for classification). This aggregation smooths out the errors of individual models, reducing overall variance.

For example, consider a decision tree model that is prone to overfitting due to its complexity. By applying bagging, you create 100 decision trees, each trained on a different bootstrap sample of the data. When making a prediction, the outputs of all 100 trees are averaged (for regression) or voted on (for classification). This reduces the impact of any single tree's overfitting, leading to a more stable and generalizable model.

In cloud-based machine learning platforms, services like Tencent Cloud TI-ONE can help implement bagging efficiently. TI-ONE provides tools for distributed training and model aggregation, making it easier to apply ensemble methods like bagging at scale.