Alexander Liu 2023年8月23日下午

1.5k 字 13 分钟

本文最后更新于：1 年前

这门课程主要关于，机器学习在工业界的运用。教授一个数据科学家，将机器学习用到工业界的时候后，在不同的阶段所遇到了一些比较重要的技术细节。

Industrial ML Applications

Manufacturing: Predictive maintenance, quality control
Retail: Recommendation, chatbox, demand forecasting
Healthcare: Alerts from real-time patient data, disease identification
Finance: Fraud detection, application processing
Automobile: Breakdown prediction, self-driving
House Price Prediction: The goal is to predict the bid price for the winning buyer

ML Workflow

Problem formulation
A loop:
- Collect & process data
- Train & Tune models
- Deploy models
- Monitor

ML Challenges

Formulate problem: focus on the most impactful industrial problems
Data: high-quality data is scarce, privacy issues
Train models: ML models are more and more complex, data-hungry, expensive
Deploy models: heavy computation is not suitable for real-time inference
Monitor: data distributions shifts, fairness issues

Roles

Domain experts: have business insights, know what data is important and where to find it, identify the real impact of a ML model
Data scientists: full stack on data mining, model training and deployment
ML experts: customize SOTA ML models
SDE: develop/maintain data pipelines, model training and serving pipelines
Skill Improvement:

SDE和领域专家也会慢慢向数据科学家靠拢，数据科学家慢慢会成为机器学习专家。

How data scientists spent their time (source: Anaconda survey 2020)

Course Topics

Techniques a data scientist needs but often not taught in university ML/stats/programming courses
- Data
  - Collect/ preprocess data
  - Covariate/ concepts/label shifts
  - Data beyond IID
- Train
  - Model validation/combinations/tuning
  - Transfer learning
  - Multi-modality
- Deploy
  - Model deployment
  - Distillation
- Monitor
  - Fairness
  - Explainability

References

AI

#研0自学

Stanford Practical Machine Learning-课程介绍

https://alexanderliu-creator.github.io/2023/08/23/stanford-practical-machine-learning-ke-cheng-jie-shao/

作者

Alexander Liu

发布于

2023年8月23日

许可协议

Stanford Pratical Machine Learning-数据获取上一篇

D2L-Paper Reading 下一篇