Stanford Pratical Machine Learning-Stacking

本文最后更新于:7 个月前

这一章主要介绍Stacking,刷榜利器!和Bagging有点像,组合多个base learner。不同点在于,Bagging是在不同的采样数据上,训练同一个模型。Stacking是在同样的数据上,训练不一样的模型。

Stacking

  • Combine multiple base learners to reduce variance
    • Base learners can be different model types
    • Linearly combine base learners outputs by learned parameters
  • Widely used in competitions
  • bagging VS stacking
    • Bagging: bootstrap samples to get diversity
    • Stacking: different types of models extract different features

image-20230826113627101

  • Code
1
2
3
from autogluon.tabular import TabularPredictor

predictor = TabularPredictor(label=label).fit(train)

Multi-layer Stacking

  • Stacking base learners in multiple levels to reduce bias
    • Can use a different set of base learners at each level
  • Upper levels (e.g. L2) are trained on the outputs of the level below (e.g. L1)
    • Concatenating original inputs helps

image-20230826114501279

  • Code
1
2
3
4
from autogluon.tabular import TabularPredictor

predictor = TabularPredictor(label=label).fit(
train, num_stack_levels=1, num_bag_folds=5)

Overfitting in Multi-layer Stacking

  • Train leaners from different levels on different data to alleviate overfitting
    • Split training data into A and B, train L1 learners on A, run inference on B to generate training data for L2 learners
  • Repeated k-fold bagging:
    • Train k models as in k-fold cross validation
    • Combine predictions of each model on out-of-fold data
    • Repeat step 1,2 by n times, average the n predictions of each example for the next level training

Model Combination Summary

image-20230826115430831

References

  1. slides

Stanford Pratical Machine Learning-Stacking
https://alexanderliu-creator.github.io/2023/08/26/stanford-pratical-machine-learning-stacking/
作者
Alexander Liu
发布于
2023年8月26日
许可协议