Stanford Pratical Machine Learning-Bagging
本文最后更新于:1 年前
这一章主要介绍Bagging和相关的知识,本质就是一种训练多个模型,然后去做决策和平均的策略和算法。主要目的是通过多个模型的决策结果来减少方差。不稳定模型 -> 稳定模型。
Bagging - Bootstrap AGGregatING
这些模型的训练是独立的,可并行的昂!!!
- Learn n base learners in parallel, combine to reduce model variance
- Each base learner is trained on a bootstrap sample
- Given a dataset of m examples, create a sample by randomly sampling m examples with replacement
- Around
unique examples will be sampled use the out-of-bag examples for validation
- Combine learners by averaging the outputs (regression) or majority voting (classification)
- Random forest: bagging with decision trees
- usually select random subset of features for each bootstrap sample
Bagging Code (scikit-learn)
- Code
1 |
|
Random Forest
- A case of bagging, use decision tree as the base learner
- Ofter randomly select a subset of features for each learner
Apply bagging with unstable Learners
方差比较大的模型,被我们称为unstable模型。
- Bagging reduces model variance, especially for unstable learners
- Given ground truth
and a set of base learners , for regression, bagging prediction: - Given
, we have
is with bagging, is single learner.
Unstable Learners
- Decision tree is unstable, linear regression is stable
References
Stanford Pratical Machine Learning-Bagging
https://alexanderliu-creator.github.io/2023/08/26/stanford-pratical-machine-learning-bagging/